08:08:02 <logger> logger has joined #rdfcore-i18n

08:08:02 <devlin.openprojects.net> Users on #rdfcore-i18n: logger gk @bwm

08:08:47 <bwm> breakout session rdfcore/i18n

08:14:57 <bwm> rdfcore's role is to clarify m&s

08:14:57 <bwm> aim to make graph concept more mathematical

08:14:57 <bwm> rdfcore is not doing new stuff

08:14:57 <bwm> if anything we are thowing a few things out

08:14:57 <bwm> the world has moved on since m&s and the specs will reflect that

08:15:03 <bwm> m&s has a sentence that implies internationalization of iri's

08:15:22 <gk> * gk Brian: question of order: is it intended that the main WG can follow the IRC??

08:16:10 <gk> * gk Just wibdered about posting the room name to #rdfcore

08:16:55 <bwm> some issues are out of scope for what we are trying to do

08:18:26 <bwm> the documents we are producing are: model theory, syntax document, test cases (needs equality), primer, schema

08:18:32 <gk> Test cases are sybntax->grapoh mapping; need clear concept of literal equality to make that work

08:21:57 <bwm> RDFCORE would like to review our approach to internationalization with i18n

08:23:36 <gk> Objectives twofold I thin: (a) address specific issues raused, (b) understand what I18N goals we consider to be in scope for RDFcore, ??

08:23:58 <bwm> Propose: Literals in the graph should be in normal form C

08:26:11 <bwm> Not talking about parseType=literal yet

08:28:02 <gk> Who is responsible for checking normalization form?

08:28:05 <bwm> RDFCore has no processing model

08:28:37 <gk> GK position... don't try to specify handling of ill-formed documents

08:29:41 <bwm> Literals in the graph are in unicode

08:33:47 <bwm> where are character references expanded?

08:33:47 <bwm> latest version of charmod has increased requirement on xml such fully normalized xml document strings beginning with a cidilla are not legal

08:33:47 <gk> Anything that may be concatenated in subsequent processing... may not start with cedilla, etc.

08:33:47 <gk> 3 levels normalization: Unicode (just text - if doesn't contain, say, c followied by cedilla)

08:34:11 <bwm> level 2: include normalized once include escapes its still normalized

08:34:28 <gk> Include normalized: after resolution of escapes, is still (Unicode) normalized

08:35:09 <bwm> level 3: to protect things up the food chain, to rely on the xml parser, if things like stripping out comments, then it wont' bevcome denormalized

08:35:37 <bwm> level 3 is concat safe but not take substring safe

08:35:45 <gk> Fully normalized: remains normalized after stripping out comments, which may result in concatention of designated substrings (according to the syntax, say XML, of the text)

08:37:58 <gk> Coming from fully-normalized XML, literals will be Unicode normalized??????? (or fully normalized: not clear to me -- if we don't know the internal structure of the literal)

08:38:24 <gk> What about literals containing single combining character???

08:39:06 <bwm> i18n would like literals in the graph to be level 3 normalized

08:39:16 <gk> Use special quoting sequence in the string? (e.g. use numeric value?)

08:39:21 <bwm> rdfcore does not define what an api does

08:42:01 <bwm> Agreed: i18n advise fully normalised strings in the graph

08:42:10 <gk> I have a concern: fully normalized presumes you know what can be done with the string - e.g. is it XML or something else?

08:43:04 <bwm> Graham - I'd like to be a record of the meeting -

08:43:18 <bwm> I'd like this to be a record fo the meeeting

08:43:31 <gk> * gk You plan to use this raw?

08:45:09 <bwm> question: what do we do with iri's

08:45:28 <bwm> three alternatives from email

08:45:43 <bwm> problem of losing information if reduce to us ascii

08:48:13 <bwm> wherever you use a uri you must allow an iri. when we tidy a graph we are only allowed one node with a givin iri label. equality of iri's is defined by algorithm c.

08:49:15 <gk> Suggestion: Comparison of URIs behaves as if all URIs/IRIs are Unicode-normalized, but spec dosn't have to say that such normalization MUST be performed; i.e. is comment on definition of URI string equality

08:50:17 <bwm> if we have two iri's that are equal, then we just pick one, which means they won't round trip precisely.

08:51:33 <bwm> next lang without parsetype literal

08:52:55 <gk> Consensus among developers who have implemented language tagging of literals is that the language info is part of the resulting literal

08:53:34 <bwm> rdfcore heading for using pair to represent strings with lang

08:53:52 <bwm> compare on lang component is case insensitive

08:54:09 <bwm> two issues: exact matching. ontology for languages.

08:55:15 <gk> Proposed rules: language and string must be identical - no lang does not match lang

08:55:21 <bwm> proposed rules for matching that strings don't match if one literal has lang and the other does not

08:56:18 <gk> I18N want match more flexible, if one has lang and the other doesn't, or if one is prefix of other.

08:57:31 <gk> Currently, also, RDF looks ike going with tidy literals -- meaing that equal literals are merged in the graph. Thus equality is a critiocal consideration.

08:57:34 <bwm> what do i18n do with requirements that don't pertain to the graph.

08:57:39 <bwm> answer talk to the app developers

08:59:29 <gk> Misha: ask for a NOTE: in the document that appicatons must deal with language matching in a sensible way, where appropriate.

08:59:41 <bwm> i18n request a note in the text to suggest that app developers might do other string application matching

09:00:44 <bwm> actual request: requirement is to ensure we don't mislead the app developers into thinking they are not allowed to do fancier string matching.

09:02:14 <gk> Misha: Ontology for language issue: people in charge of language tags periodically change them. If RDF graphs are suppoosed to be persistent, that's painful.

09:02:30 <gk> ... instead of natural strings, use URIs?

09:02:36 <bwm> lang tags periodically change, which is a problem if rdf graphs are long lasting; so use a uri for the language, not the lang tag.

09:03:31 <bwm> interesting idea - we'll think about it

09:04:22 <bwm> in some way we want it to represent the abstract xml between the two tags.

09:04:55 <bwm> we need at least a bit to say that its xml

09:05:17 <bwm> represent the xml with a string that is the canonicalised represention xml

09:07:08 <bwm> the xml parser should do it right so we dont' have to do anything

09:10:17 <bwm> xml lang is inherited in xml - that is correctly handled by rdf/xml translation

09:12:09 <bwm> requirement is the graph round trips. misha happy.

09:12:29 <bwm> Jeremy discusses complicated xml:lang example from yesterday

09:13:32 <bwm> requirement is that language tags are present where possibel

09:13:55 <gk> Much more important that small strings be language-tagged than larger documents -- language-sniffing techniques don't work in shorter strings

09:13:56 <bwm> language tags more important on small strings cos its harder to 'sniff' the language

09:14:53 <bwm> Our propose solution is acceptable

09:15:32 <bwm> how do we decide two parsetype literals are equal

09:17:04 <gk> MartnD: complete canonicalization is unlikely to be achieved.

09:18:50 <bwm> lang works ok - we've examined nasty test case

09:18:59 <bwm> equlity is our problem - no internationalization issues

09:23:38 <bwm> i18n request that we use same escaping mechanism for iri's as for all other characters.

09:23:49 <bwm> requirement is to recover the iri when its back in the graphp.

09:24:39 <gk> I think the problem here is that URIs/IRIs and string contents are escaped in different ways???

09:24:59 <bwm> problem is that loose form of iri

09:25:33 <gk> Which makes it diffcult to recover the original IRI... but if we used string-escaping style that wouldn't happen.

09:27:06 <gk> Misha has expressed a concern that the N-triple format will "escape" its intended purpose of testing.

09:28:09 <gk> bwm: putting an I18N burden on working groups' internal tool formats is too much of a buirden.

09:28:17 <bwm> clear health warning this is not a 'public' syntax

09:29:43 <bwm> misha not happy about escape syntax used

09:30:45 <bwm> rdfcore used fixed width escape sequence

09:31:24 <bwm> action daveb to check escape sequence range

09:32:04 <bwm> deployment issue is that if we follow charmod we create dependencies on xml guys having implemented it

09:36:23 <bwm> two issues: holding up candidate rec process - maybe get waiver from director

09:36:35 <bwm> other - dependencies on specs which are not recs

09:36:49 <bwm> suggest we use SHOULD language

09:38:34 <gk> * gk My battery's about to die, I think. ...

IRC Log of RDFCore/I18N Breakout Meeting