14:49:23 RRSAgent has joined #annotation 14:49:23 logging to http://www.w3.org/2015/06/03-annotation-irc 14:52:22 Matt_Haas has joined #annotation 14:54:39 q? 14:56:50 Regrets: Frederick_Hirsch, Ray_Denenberg 14:56:54 Present+ Rob_Sanderson 14:58:00 TimCole has joined #annotation 14:58:18 bjdmeest has joined #annotation 14:58:26 Jacob has joined #annotation 14:58:27 Present+ bjdmeest 14:59:08 Janina_ has joined #annotation 14:59:40 Present+ Addison_Phillips 14:59:49 Present+ Matt_Haas 14:59:57 Present+ Tim_Cole 15:00:02 Present+ Matt_Haas 15:00:03 Present+ Ivan_Herman 15:01:06 Present+ Richard_Ishida 15:01:16 r12a has joined #annotation 15:01:36 Kyrce has joined #annotation 15:02:05 RangeFinder draft: http://w3c.github.io/web-annotation/api/rangefinder/ 15:02:29 Present+ Janina_Sarol 15:02:57 Takeshi's character length research: https://gist.github.com/tkanai/e2984cfa14cf099baa94 15:03:04 tbdinesh has joined #annotation 15:03:11 Present+ Jacob_Jett 15:03:36 Addison is planning to attend 15:03:50 Present+ Kyrce_Swenson 15:04:08 MiguelAraCo has joined #annotation 15:04:13 present? 15:05:00 Present+ Tim_Cole 15:06:35 Present+ TB_Dinesh 15:06:49 davis_salisbury has joined #annotation 15:07:08 davis_salisbury present+ 15:07:14 scribenick bjdmeest 15:07:23 [people introducing] 15:08:38 call in user 4 is David Salisbury 15:10:00 s/David/Davis/ 15:10:01 takeshi has joined #annotation 15:11:22 Takeshi: can you join the call? 15:11:51 Present+ Benjamin_Young 15:12:01 zakim, who's here? 15:12:01 sorry, r12a, I don't know what conference this is 15:12:03 On IRC I see takeshi, davis_salisbury, MiguelAraCo, tbdinesh, Kyrce, r12a, Janina_, Jacob, bjdmeest, TimCole, Matt_Haas, RRSAgent, Zakim, azaroth, PaoloCiccarese, renoirb, dauwhe, 15:12:03 ... ivan, KevinMarks, bigbluehat, tessierashpool_, csillag_, ujvari_, oshepherd, nickstenn, JakeHart, dwhly, stain, trackbot, Mitar, rhiaro 15:12:12 takeshi has joined #annotation 15:12:14 dauwhe has left #annotation 15:12:17 aphillip has joined #annotation 15:12:37 azaroth: issues to dicuss: 15:12:45 ... i18n for annotation 15:13:14 ... and in particular (brought up in F2F of april): codepoint issues 15:13:44 ... and how that would affect the counts for the anchoring of an annotation 15:13:51 ... second issue: rangefinder API 15:14:27 Present+ Paolo_Ciccarese 15:14:53 TOPIC: character counting 15:15:07 https://gist.github.com/tkanai/e2984cfa14cf099baa94 15:15:16 azaroth: issue is 15:15:25 ... in annotation model, there are different selectors 15:15:37 ... we need consistent way of implementations of range by character count 15:16:19 ... text length is counted differently in different languages 15:16:25 ... (see gist) 15:16:28 Present+ Takeshi_Kanai 15:17:24 addison: programming languages that use UTF16 use code points 15:17:39 ... languages that use UTF8 hide that 15:18:04 ... code points makes most sense, most languages can find out code points 15:18:28 ... other issues: the 'user perceived boundaries' 15:18:46 ... e.g., boundary between colors, emoji's etc.. 15:18:57 ... that is harder from a spec point of view 15:19:21 azaroth: so python uses code points, and javascript can get to the code points? 15:19:25 addison: yes 15:20:55 q+ 15:21:07 azaroth: easiest is thus to count code points, and make note that javascript implementations will be able to this, but currently can't 15:21:24 addisson: and help for e.g. unicode controls 15:21:32 q+ 15:21:37 s/addisson/addison/ 15:22:03 azaroth: e.g. e + acute character? 15:22:05 ack TimCole 15:22:12 addison: yes, and also emoji's 15:22:37 TimCole: thinking about use cases: user can highlight part of text 15:22:39 q+ 15:22:51 ... so perceived characters is what the user thinks he/she is annotating 15:23:28 ... e.g., perceived characters that count differently on different devices: 15:23:44 ... is there a different between laptop-viz and smartphone-viz? 15:23:52 addison: that won't be a problem 15:24:52 ... e.g. scripted selection or programmatic manipulation is tricky for user perceived text selection 15:25:29 ack r12a 15:25:41 ... boundaries will always be in the same places (codepoints), but scripting languages might translate those codepoints to different bytes (eg utf8 vs utf16 15:26:19 r12a: there are other control characters, e.g., bidirectional controls 15:26:36 ... if one of those appears at boundary of selection, that might be an issue 15:27:15 ... also: if a user will count the amount of characters they want, that will be more problematic 15:27:55 q+ to ask about polyfill possibility 15:28:20 ... third thing: in javascript it is possible to detect characters out of the 'normal range' 15:28:33 ack takeshi 15:30:00 takeshi: to my understanding: it is not necessary to find accurate character length, but enough if we can count letters with unicode code points 15:30:37 adisson: invisible code points can change visualization of other code points. If you stop the boundary too soon, you might not pick up important modifier 15:30:57 q+ 15:31:30 takeshi: invisible character might become visible as a square 15:31:32 ack azaroth 15:31:32 azaroth, you wanted to ask about polyfill possibility 15:32:19 addison: programmers can detect when a length is 2 instead of 1, because that will lie in a certain range 15:32:44 q? 15:32:46 azaroth: is it possible to polyfill that? that probably needs some hardcore javascript programmer 15:32:47 ack r12a 15:33:11 r12a: the multilingual plane will always return single code point 15:33:41 ... beyond that (e.g. chinese and japanese) will always need to be combined, they are surrogates 15:33:59 ... so you can look for the second 'character' to combine them 15:34:37 ... However, the question is: why would you need to count characters? user selects text via highlight, so he is in control of the range 15:34:54 ...what gets picked up is a range of text, then you don't need to count 15:35:25 addison: you need to remember what the original annotated content was. 15:35:30 q? 15:35:49 ... When you try to compute where the annotation should go, that is a different use case then manual selection (e.g., RangeFinder) 15:36:22 azaroth: there is offset-based and string-based selectors: one of this issues is copyright and IP 15:36:51 ... if annotations are created that record 100 characters, and with enough annotations, you could reconstruct the entire copyrighted text 15:37:18 ... with offset, you cannot reconstruct the entire text 15:37:33 ... so only recording the exact text string is not enough 15:38:01 q? 15:38:31 addison: offset is also more efficient: you don't want to ship the entire selection (if that selection is large) together with that annotation 15:38:46 TOPIC: RangeFinder API 15:39:26 azaroth: RangeFinder API is browserlevel api to discover ranges using input that describes that range of text 15:40:01 ... basic constructor has inputs, e.g., prefix, suffix, text, character start, character length, xpath 15:40:13 ... or case folding (is case important or not) 15:40:39 ... or unicode folding (is e+accent vs e important or not) 15:41:15 ... or word boundary important (should tooth also be used to identify toothpaste) 15:41:35 ... Addison had a look from a i18n point of view 15:41:47 https://lists.w3.org/Archives/Public/www-international/2015AprJun/0136.html 15:42:19 addison: we've been working on a character string model 15:42:59 ... first reaction: bunch of options, they might not all make sense 15:43:27 http://w3c.github.io/charmod-norm/#searching 15:43:40 azaroth: ways forward: first as simple as possible? 15:44:07 adisson: no, you need to think of these problems (of other languages) from early on 15:44:34 q+ 15:44:38 ... e.g., word boundary is very easy implementable for latin-based languages, and makes sense for japanese, but japanese does not use spaces 15:44:50 ack r12a 15:45:01 ... that would need a dictionary 15:45:42 r12a: if you doubleclick on a word of any language, browsers can handle the word selection form japanese and thai... 15:45:44 q? 15:45:53 ... other languages use special characters to split words 15:46:25 addison: word boundary (browser-based) is not perfect 15:46:29 q+ 15:47:10 azaroth: documentation should include those links, together with charmod 15:47:16 ... and keep RangeFinder in sync 15:47:48 addison: there is a section about case folding, that is language sensitive (spec might need to take that into account) 15:48:16 ... current text in RangeFinder about unicode normalization has some major challenges 15:48:36 ... you can do harm to text if you remove all combining characters, that might break some languages 15:48:47 ack TimCole 15:49:09 q+ 15:49:12 TimCole: Anno WG needs to think more about use cases and requirements, in order to say: 15:49:37 see also: http://www.unicode.org/reports/tr10/#Searching 15:49:58 ... in languages X and Y, we will not support word folding 15:50:07 ... to make more realistic implementations for browsers 15:50:25 azaroth: stronger i18n-based use cases are needed 15:50:42 ack takeshi 15:53:31 addison: collation algorithm is more tuned to sorting tokens than to search text 15:53:36 ... computing time is larger 15:53:54 azaroth: i18n for the body of the annotation not just the target document segment is a great point 15:54:04 q? 15:54:59 azaroth: encoding of the annotations themselves if body or target has i18n-characters 15:55:17 ...is that a problem for JSON(-LD) or Turtle that we need to take into account? 15:55:29 q+ 15:55:42 q- 15:55:50 addison: both formats use sequence of unicode-characters, so it shouldn't an issue 15:56:38 bigbluehat: [will continue about this on mail] 15:57:04 addison: we are very interested for input on how best to document this, and getting reviews for our documents 15:57:53 azaroth: takeshi, and others, can you have a look at the character model document 15:58:13 ... thank you Addison and Richard for your input 15:59:10 rrsagent, draft minutes 15:59:10 I have made the request to generate http://www.w3.org/2015/06/03-annotation-minutes.html ivan 15:59:15 q+ 15:59:16 r18n: [about RangeFinder API review] you can use the mailing list 15:59:20 ack r12a 15:59:24 ... of i18n working group 15:59:35 s/r18n/addison/ 16:00:00 ... If there comments about character model (@Takeshi), you can add issues to the github repo 16:00:00 tantek has joined #annotation 16:00:35 rrsagent, draft minutes 16:00:35 I have made the request to generate http://www.w3.org/2015/06/03-annotation-minutes.html ivan 16:00:36 shepazu has joined #annotation 16:01:11 tracker, end telcon 16:01:25 trackbot, end telcon 16:01:25 Zakim, list attendees 16:01:25 sorry, trackbot, I don't know what conference this is 16:01:33 RRSAgent, please draft minutes 16:01:33 I have made the request to generate http://www.w3.org/2015/06/03-annotation-minutes.html trackbot 16:01:34 RRSAgent, bye 16:01:53 rrsagent, set log public 16:01:58 rrsagent, bye 16:01:58 I see no action items