alolita: review new issues, look at action items, start on comments
<alolita> https://www.w3.org/International/groups/indic-layout/track/actions/open
<scribe> A6: Complete devanagari and bengali sections 2.6
Complete devanagari and bengali sections 2.6 PENDING
Vivek to provide information about Bengali
A8: Add issue about zwj/zwnj stuff to begin fleshing out the problem
https://github.com/w3c/iip/issues/14
close action-8
<trackbot> Closed action-8.
A9: Add issue about devanagari numerals to help provide use case examples
https://github.com/w3c/iip/issues/15
vivek: native numerals are sometimes used, but i'm unable to see this as a gap - there is a straightforward mapping
akshat: on previous call we discussed initial text and there was some confusion about what Vivek was trying to say
<alolita> akshat: there is w3c css spec support for calendar, date support in Devanagari and Latin
akshat: if i want to choose a
devanagari calendar it should be not dependent on the
developer, but specified by w3c
... there's some confusion about what is being said
close action-9
<trackbot> Closed action-9.
alolita: please all add to the github issue
A10: Add text to 2.8 about general problems for segmentation PENDING
akshat: we'll add that today
alolita: there was some issue about adding Muthu's comments to github, so we should add after
muthu: i pointed out some areas that need attention
<alolita> muthu: there are 4 locales for tamil
<alolita> muthu: for locale ta_MY and ta_sg - a Latin oriented format is used for numerals
[[
2.7 Numbers, dates, etc
The usage of Tamil numerals has fallen out of common usage, though we do find them used occasionally by a few. ASCII numerals are used in common practice, and thus should be the default or fallback when there are no options available.
ta_my and ta_sg follows the English number format (123,456,789,000) and do not follow the number format used in ta_in and ta_lk.
]]
alolita: tamil numerals only used in classical texts?
muthu: correct
... i think also in malayalam and telugu not used
... in kannada they are used
alolita: should both be available?
muthu: generally only ascii
needed, but it would be nice to have an option for users to use
native numbers
... there are some who may want to read in tamil numerals
alolita: how about in calendars?
muthu: all ascii for tamil
<alolita> richard: originally tamil did not have a zero
richard: if people want to use tamil numerals would they use per a decimal based system
muthu: yes
<alolita> muthu: the old books from 50+ years have tamil numerals
neha: if i want to display numbers in tamil there should be some tag to change numbers to tamil
muthu: if such a tag is not provided, then ascii numbers should be used
neha: that is a gap right now - no tag to switch to tamil numerals
alolita: we have noted that there is a gap that needs to be addressed
muthu: all the ta locales are the same, including sinhalese
[[
2.8 Text boundaries & selection
There are only two sequence of characters that form conjuncts in Tamil. Both are not native to Tamil. ஶ்ரீ and க்ஷ. Other than these two, no other CHC combinations form conjuncts. We should be able to place the cursor between the H and C (eg: CH<cursor>C). This issue was fixed in Android Oreo and iOS 12. The problem exists in many places and needs checking to identify which browsers support and which do not.
]]
alolita: if you want to translate a historical text into tamil how will it be translated? with or without conjuncts?
muthu: in modern languages they write phonetically and pulli remains visible
r12a: https://github.com/w3c/ilreq/issues/31 is a related issue
<scribe> ACTION: r12a to raise tamil segmentation issue in our repo
<trackbot> Created ACTION-11 - Raise tamil segmentation issue in our repo [on Richard Ishida - due 2018-08-17].
alolita: so this issue is fixed in recent platforms - you can now put the cursor between
muthu: yes
neha: the segmentation rules for akshara @@@
https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_boundaries
vivek: tamil doesn't fall in line with other scripts for handling of clusters
muthu summarises neha and akshat
muthu: ilreq has already specified the halant cluster model - vivek is saying that doesn't cover tamil because it's a different
<alolita> akshat: there are 2 definitions of akshara
<alolita> akshat: one definition refers to one encoding for all indian scripts
<alolita> akshat: this is the IS13194 definition
akshat: there are two actual definitions today, iscii 1394 list all conjuncts
<alolita> akshat: the other definition is from unicode
akshat: when unicode came around it broke away individual scripts into separate code pages, unlike iscii,
<alolita> akshat: unicode instead allocated different code pages for each indian language script
<alolita> akshat: in the ilreq document, the scripts and segmentation definitions are not clear
akshat: ilreq doc is unicode
specific but doesn't clarify in terms of what scripts are
supported - the definition is oriented towards devanagari
languages, except for santali
... but bengali, malayalam, gurmukhi requirements are not
captured by ilreq
... for tamil we don't need new categories to add to this
definition
... definition talks about CHC but in tamil it's only
applicable for the two conjuncts
alolita: going back to muthu and
vivek, there should be a clear definition for tamil so that can
be used as foundation for unicode
... having the clarification of differences is needed - that's
a gap
[[
2.10.1 Syllable/Akshara spacing
Need to understand what is meant by: Consonant+Matra+Matra, the breaking seems to stack ill formed akshara into one set instead of clearly breaking it separate. This breaking behaviour needs to improve.
Consonant+Matra+Matra is valid in Tamil
]]
<alolita> vivek: eastern scripts (bengali, oriya) and southern scripts both support split matras
vivek: describes use of
matras...
... this is a massive bug and common to most of our
languages
<alolita> vivek: there is no clear definition - consonant+recursive-matras should not be allowed
<alolita> ... the unicode spec should be corrected to reflect this
<alolita> consonant+matra+matra is allowed in unicode
akshat: when you say that multiple matras are allowed in unicode - is this application specific ?
<alolita> ... open type also supports this unicode definition
akshat: that's an implementation issue rather than unicode issue
vivek: please point to the part of the unicode standard that describes this
<alolita> akshat: clear rules for syllable boundaries need to be defined
akshat: whatever unicode says is in ch12 but doesn't specific what should join and what not
<C-DAC_GIST> http://unicode.org/versions/Unicode8.0.0/ch12.pdf
<scribe> ACTION: Muthu (and Vivek) to verify the definitions in ch12 for Tamil
<trackbot> Created ACTION-12 - (and vivek) to verify the definitions in ch12 for tamil [on Muthu Nedumaran - due 2018-08-17].
muthu: I raised this because it is stated as ill-formed but i don't think that is corret
akshat: upshot is lack of clarity of askshara definition
<scribe> ACTION: Alolita to convert Muthu's comments to github issues
<trackbot> Created ACTION-13 - Convert muthu's comments to github issues [on Alolita Sharma - due 2018-08-17].
[[
2.12.1 Underline and Overline behaviour
Tamil and other south Indian scripts do not have a shirorekha or line below as in Devanagari. The underline should match that of Latin in a bilingual (or dual script) document, which is more common in Malaysia and Singapore. However, it needs to align with the underline of Devanagari when it combines with Hindi or Sanskrit.
]]
muthu: malaysia and singapore use
tamil and documents with tamil and latin on same line, the
underline should be at the same place for both
... all tamil fonts include latin glyphs too, so the issue
doesn't arise so much
alolita: this issue would arise
in india, esp in publishing with mixed scripts
... so gap is that rules don't exist for what should happen for
position of underline and overline
r12a: recommend that we look at the CSS Text module and check whether it addresses these issues
https://drafts.csswg.org/css-text-decor-3/#line-decoration
http://w3c.github.io/typography/#text_decoration
<scribe> ACTION: Alolita (all) to review CSS specification for features
<trackbot> Created ACTION-14 - (all) to review css specification for features [on Alolita Sharma - due 2018-08-17].
[[
3.1 and 3.2 Line breaking and hyphenation
There are some simple rules for line breaking. Different people use different implementations. However, I can’t find a decent document for this online. Here’s a paper presented at a conference held in Singapore: https://www.academia.edu/671796/Tamil_Hyphenator_P._David_Prabhakar. The First 3 rules in the section Rules for Tamil Hyphenation is a good start.
]]
muthu: how do we frame the issue here ?
r12a: the gap would be that hyphenation is not happening for users in browsers, then the next step would be to ask why
vivek: cdac has rules for many languages and this may be available (though maybe not fully comprehensive) but could be a useful resource
discussion about how to find the information
<scribe> ACTION: Akshat do a general search in CDAC for original rule book for hypenation in 11 scrpts
<trackbot> Error creating an ACTION: could not connect to Tracker. Please mail <sysreq@w3.org> with details about what happened.
<scribe> ACTION: Akshat to do a general search in CDAC for original rule book for hypenation in 11 scrpts
<trackbot> Created ACTION-16 - Do a general search in cdac for original rule book for hypenation in 11 scrpts [on Akshat Joshi - due 2018-08-17].
[[
3.4 Counters, lists, etc
Need to understand : the other relies on the user-defined mechanism specified in that spec in order to be applied.
Shouldn’t the default be ASCII numerals and, Tamil numerals be user defined?
]]
[[
3.5 Initial letter styling
Need to be mindful of conjuncts as defined in 2.8 above.
]]
[[
3.7 Other paragraph features
Tamil can start a paragraph with or without indents. Paragraph features are the same as English.
]]
<alolita_> yes
Meeting adjourned
Next meeting: two weeks