India International Program Teleconference -- 10 Aug 2018

Agenda and Minutes

alolita: review new issues, look at action items, start on comments

Review of any pending action items

<alolita> https://www.w3.org/International/groups/indic-layout/track/actions/open

<scribe> A6: Complete devanagari and bengali sections 2.6

Complete devanagari and bengali sections 2.6 PENDING

Vivek to provide information about Bengali

A8: Add issue about zwj/zwnj stuff to begin fleshing out the problem

https://github.com/w3c/iip/issues/14

close action-8

<trackbot> Closed action-8.

A9: Add issue about devanagari numerals to help provide use case examples

https://github.com/w3c/iip/issues/15

vivek: native numerals are sometimes used, but i'm unable to see this as a gap - there is a straightforward mapping

akshat: on previous call we discussed initial text and there was some confusion about what Vivek was trying to say

<alolita> akshat: there is w3c css spec support for calendar, date support in Devanagari and Latin

akshat: if i want to choose a devanagari calendar it should be not dependent on the developer, but specified by w3c
... there's some confusion about what is being said

close action-9

<trackbot> Closed action-9.

alolita: please all add to the github issue

A10: Add text to 2.8 about general problems for segmentation PENDING

akshat: we'll add that today

Discussion of comments in issues posted on GitHub for Devanagari, Bengali and Tamil.

alolita: there was some issue about adding Muthu's comments to github, so we should add after

muthu: i pointed out some areas that need attention

<alolita> muthu: there are 4 locales for tamil

<alolita> muthu: for locale ta_MY and ta_sg - a Latin oriented format is used for numerals

[[

2.7 Numbers, dates, etc

The usage of Tamil numerals has fallen out of common usage, though we do find them used occasionally by a few. ASCII numerals are used in common practice, and thus should be the default or fallback when there are no options available.

ta_my and ta_sg follows the English number format (123,456,789,000) and do not follow the number format used in ta_in and ta_lk.

]]

alolita: tamil numerals only used in classical texts?

muthu: correct
... i think also in malayalam and telugu not used
... in kannada they are used

alolita: should both be available?

muthu: generally only ascii needed, but it would be nice to have an option for users to use native numbers
... there are some who may want to read in tamil numerals

alolita: how about in calendars?

muthu: all ascii for tamil

<alolita> richard: originally tamil did not have a zero

richard: if people want to use tamil numerals would they use per a decimal based system

muthu: yes

<alolita> muthu: the old books from 50+ years have tamil numerals

neha: if i want to display numbers in tamil there should be some tag to change numbers to tamil

muthu: if such a tag is not provided, then ascii numbers should be used

neha: that is a gap right now - no tag to switch to tamil numerals

alolita: we have noted that there is a gap that needs to be addressed

muthu: all the ta locales are the same, including sinhalese

[[

2.8 Text boundaries & selection

There are only two sequence of characters that form conjuncts in Tamil. Both are not native to Tamil. ஶ்ரீ and க்ஷ. Other than these two, no other CHC combinations form conjuncts. We should be able to place the cursor between the H and C (eg: CH<cursor>C). This issue was fixed in Android Oreo and iOS 12. The problem exists in many places and needs checking to identify which browsers support and which do not.

]]

alolita: if you want to translate a historical text into tamil how will it be translated? with or without conjuncts?

muthu: in modern languages they write phonetically and pulli remains visible

r12a: https://github.com/w3c/ilreq/issues/31 is a related issue

<scribe> ACTION: r12a to raise tamil segmentation issue in our repo

<trackbot> Created ACTION-11 - Raise tamil segmentation issue in our repo [on Richard Ishida - due 2018-08-17].

alolita: so this issue is fixed in recent platforms - you can now put the cursor between

muthu: yes

neha: the segmentation rules for akshara @@@

https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_boundaries

vivek: tamil doesn't fall in line with other scripts for handling of clusters

muthu summarises neha and akshat

muthu: ilreq has already specified the halant cluster model - vivek is saying that doesn't cover tamil because it's a different

<alolita> akshat: there are 2 definitions of akshara

<alolita> akshat: one definition refers to one encoding for all indian scripts

<alolita> akshat: this is the IS13194 definition

akshat: there are two actual definitions today, iscii 1394 list all conjuncts

<alolita> akshat: the other definition is from unicode

akshat: when unicode came around it broke away individual scripts into separate code pages, unlike iscii,

<alolita> akshat: unicode instead allocated different code pages for each indian language script

<alolita> akshat: in the ilreq document, the scripts and segmentation definitions are not clear

akshat: ilreq doc is unicode specific but doesn't clarify in terms of what scripts are supported - the definition is oriented towards devanagari languages, except for santali
... but bengali, malayalam, gurmukhi requirements are not captured by ilreq
... for tamil we don't need new categories to add to this definition
... definition talks about CHC but in tamil it's only applicable for the two conjuncts

alolita: going back to muthu and vivek, there should be a clear definition for tamil so that can be used as foundation for unicode
... having the clarification of differences is needed - that's a gap

[[

2.10.1 Syllable/Akshara spacing

Need to understand what is meant by: Consonant+Matra+Matra, the breaking seems to stack ill formed akshara into one set instead of clearly breaking it separate. This breaking behaviour needs to improve.

Consonant+Matra+Matra is valid in Tamil

]]

<alolita> vivek: eastern scripts (bengali, oriya) and southern scripts both support split matras

vivek: describes use of matras...
... this is a massive bug and common to most of our languages

<alolita> vivek: there is no clear definition - consonant+recursive-matras should not be allowed

<alolita> ... the unicode spec should be corrected to reflect this

<alolita> consonant+matra+matra is allowed in unicode

akshat: when you say that multiple matras are allowed in unicode - is this application specific ?

<alolita> ... open type also supports this unicode definition

akshat: that's an implementation issue rather than unicode issue

vivek: please point to the part of the unicode standard that describes this

<alolita> akshat: clear rules for syllable boundaries need to be defined

akshat: whatever unicode says is in ch12 but doesn't specific what should join and what not

<C-DAC_GIST> http://unicode.org/versions/Unicode8.0.0/ch12.pdf

<scribe> ACTION: Muthu (and Vivek) to verify the definitions in ch12 for Tamil

<trackbot> Created ACTION-12 - (and vivek) to verify the definitions in ch12 for tamil [on Muthu Nedumaran - due 2018-08-17].

muthu: I raised this because it is stated as ill-formed but i don't think that is corret

akshat: upshot is lack of clarity of askshara definition

<scribe> ACTION: Alolita to convert Muthu's comments to github issues

<trackbot> Created ACTION-13 - Convert muthu's comments to github issues [on Alolita Sharma - due 2018-08-17].

[[

2.12.1 Underline and Overline behaviour

Tamil and other south Indian scripts do not have a shirorekha or line below as in Devanagari. The underline should match that of Latin in a bilingual (or dual script) document, which is more common in Malaysia and Singapore. However, it needs to align with the underline of Devanagari when it combines with Hindi or Sanskrit.

]]

muthu: malaysia and singapore use tamil and documents with tamil and latin on same line, the underline should be at the same place for both
... all tamil fonts include latin glyphs too, so the issue doesn't arise so much

alolita: this issue would arise in india, esp in publishing with mixed scripts
... so gap is that rules don't exist for what should happen for position of underline and overline

r12a: recommend that we look at the CSS Text module and check whether it addresses these issues

https://drafts.csswg.org/css-text-decor-3/#line-decoration

http://w3c.github.io/typography/#text_decoration

<scribe> ACTION: Alolita (all) to review CSS specification for features

<trackbot> Created ACTION-14 - (all) to review css specification for features [on Alolita Sharma - due 2018-08-17].

[[

3.1 and 3.2 Line breaking and hyphenation

There are some simple rules for line breaking. Different people use different implementations. However, I can’t find a decent document for this online. Here’s a paper presented at a conference held in Singapore: https://www.academia.edu/671796/Tamil_Hyphenator_P._David_Prabhakar. The First 3 rules in the section Rules for Tamil Hyphenation is a good start.

]]

muthu: how do we frame the issue here ?

r12a: the gap would be that hyphenation is not happening for users in browsers, then the next step would be to ask why

vivek: cdac has rules for many languages and this may be available (though maybe not fully comprehensive) but could be a useful resource

discussion about how to find the information

<scribe> ACTION: Akshat do a general search in CDAC for original rule book for hypenation in 11 scrpts

<trackbot> Error creating an ACTION: could not connect to Tracker. Please mail <sysreq@w3.org> with details about what happened.

<scribe> ACTION: Akshat to do a general search in CDAC for original rule book for hypenation in 11 scrpts

<trackbot> Created ACTION-16 - Do a general search in cdac for original rule book for hypenation in 11 scrpts [on Akshat Joshi - due 2018-08-17].

[[

3.4 Counters, lists, etc

Need to understand : the other relies on the user-defined mechanism specified in that spec in order to be applied.

Shouldn’t the default be ASCII numerals and, Tamil numerals be user defined?

]]

[[

3.5 Initial letter styling

Need to be mindful of conjuncts as defined in 2.8 above.

]]

[[

3.7 Other paragraph features

Tamil can start a paragraph with or without indents. Paragraph features are the same as English.

]]

<alolita_> yes

Meeting adjourned

Next meeting: two weeks

- DRAFT -

India International Program Teleconference

10 Aug 2018

Attendees

Contents

Agenda and Minutes

Review of any pending action items

Discussion of comments in issues posted on GitHub for Devanagari, Bengali and Tamil.

Summary of Action Items

Summary of Resolutions