IRC Log of RDFCore/I18N Breakout Meeting

Thanks to Dave Beckett's IRC Logger for creating hte log.

08:08:02 <logger> logger has joined #rdfcore-i18n
08:08:02 <devlin.openprojects.net> Users on #rdfcore-i18n: logger gk @bwm
08:08:47 <bwm> breakout session rdfcore/i18n
08:08:55 <bwm> present
08:09:05 <bwm> brian mcbride
08:09:09 <bwm> jeremy carroll
08:09:13 <bwm> dave beckett
08:09:17 <bwm> graham klyne
08:09:21 <bwm> martin horner
08:09:27 <bwm> s/martin/martyn/
08:09:33 <bwm> misha wolfe
08:09:40 <bwm> martin duerst
08:09:55 <bwm> agenda:
08:09:58 <bwm> intro to rdfcore
08:10:05 <bwm> 3 issues:
08:10:08 <bwm> normalization
08:10:10 <bwm> xml:lang
08:10:12 <bwm> uri's
08:10:15 <bwm> n-triples
08:14:57 <bwm> rdfcore's role is to clarify m&s
08:14:57 <bwm> aim to make graph concept more mathematical
08:14:57 <bwm> rdfcore is not doing new stuff
08:14:57 <bwm> if anything we are thowing a few things out
08:14:57 <bwm> e.g. rdf:aboutEachPrefix
08:14:57 <bwm> the world has moved on since m&s and the specs will reflect that
08:15:03 <bwm> m&s has a sentence that implies internationalization of iri's
08:15:22 <gk> * gk Brian: question of order: is it intended that the main WG can follow the IRC??
08:15:48 <bwm> no
08:15:52 <bwm> why
08:16:04 <bwm> no reason why they shouldn't
08:16:10 <gk> * gk Just wibdered about posting the room name to #rdfcore
08:16:19 <bwm> good idea - go ahead
08:16:55 <bwm> some issues are out of scope for what we are trying to do
08:18:26 <bwm> the documents we are producing are: model theory, syntax document, test cases (needs equality), primer, schema
08:18:32 <gk> Test cases are sybntax->grapoh mapping;  need clear concept of literal equality to make that work
08:20:47 <bwm> Objectives of meeting
08:21:57 <bwm> RDFCORE would like to review our approach to internationalization with i18n
08:22:08 <bwm> Issue:  Normalization
08:23:36 <gk> Objectives twofold I thin:  (a) address specific issues raused, (b) understand what I18N goals we consider to be in scope for RDFcore, ??
08:23:58 <bwm> Propose: Literals in the graph should be in normal form C
08:26:11 <bwm> Not talking about parseType=literal yet
08:28:02 <gk> Who is responsible for checking normalization form?
08:28:05 <bwm> RDFCore has no processing model
08:28:37 <gk> GK position... don't try to specify handling of ill-formed documents
08:29:41 <bwm> Literals in the graph are in unicode
08:33:47 <bwm> where are character references expanded?
08:33:47 <bwm> latest version of charmod has increased requirement on xml such fully normalized xml document strings beginning with a cidilla are not legal
08:33:47 <gk> Anything that may be concatenated in subsequent processing... may not start with cedilla, etc.
08:33:47 <gk> 3 levels normalization: Unicode (just text - if doesn't contain, say, c followied by cedilla)
08:33:47 <bwm> level 1: unicode normalized
08:34:11 <bwm> level 2: include normalized once include escapes its still normalized
08:34:28 <gk> Include normalized:  after resolution of escapes, is still (Unicode) normalized
08:35:09 <bwm> level 3: to protect things up the food chain, to rely on the xml parser, if things like stripping out comments, then it wont' bevcome denormalized
08:35:37 <bwm> level 3 is concat safe but not take substring safe
08:35:45 <gk> Fully normalized:  remains normalized after stripping out comments, which may result in concatention of designated substrings (according to the syntax, say XML, of the text)
08:37:58 <gk> Coming from fully-normalized XML, literals will be Unicode normalized???????  (or fully normalized:  not clear to me -- if we don't know the internal structure of the literal)
08:38:24 <gk> What about literals containing single combining character???
08:39:06 <bwm> i18n would like literals in the graph to be level 3 normalized
08:39:16 <gk> Use special quoting sequence in the string?  (e.g. use numeric value?)
08:39:21 <bwm> rdfcore does not define what an api does
08:42:01 <bwm> Agreed: i18n advise fully normalised strings in the graph
08:42:10 <gk> I have a concern:  fully normalized presumes you know what can be done with the string - e.g. is it XML or something else?
08:42:35 <bwm> uris:
08:43:04 <bwm> Graham - I'd like to be a record of the meeting -
08:43:18 <bwm> I'd like this to be a record fo the meeeting
08:43:31 <gk> * gk You plan to use this raw?
08:45:09 <bwm> question: what do we do with iri's
08:45:28 <bwm> three alternatives from email
08:45:43 <bwm> problem of losing information if reduce to us ascii
08:48:13 <bwm> wherever you use a uri you must allow an iri.  when we tidy a graph we are only allowed one node with a givin iri label.  equality of iri's is defined by algorithm c.
08:49:15 <gk> Suggestion:  Comparison of URIs behaves as if all URIs/IRIs are Unicode-normalized, but spec dosn't have to say that such normalization MUST be performed;  i.e. is comment on definition of URI string equality
08:50:17 <bwm> if we have two iri's that are equal, then we just pick one, which means they won't round trip precisely.
08:51:33 <bwm> next lang without parsetype literal
08:52:55 <gk> Consensus among developers who have implemented language tagging of literals is that the language info is part of the resulting literal
08:53:05 <bwm> thank you
08:53:34 <bwm> rdfcore heading for using pair to represent strings with lang
08:53:52 <bwm> compare on lang component is case insensitive
08:54:09 <bwm> two issues: exact matching.  ontology for languages.
08:55:15 <gk> Proposed rules:  language and string must be identical - no lang does not match lang
08:55:21 <bwm> proposed rules for matching that strings don't match if one literal has lang and the other does not
08:56:18 <gk> I18N want match more flexible, if one has lang and the other doesn't, or if one is prefix of other.
08:56:19 <bwm> rdfcore needs transitivity
08:57:31 <gk> Currently, also, RDF looks ike going with tidy literals -- meaing that equal literals are merged in the graph.  Thus equality is a critiocal consideration.
08:57:34 <bwm> what do i18n do with requirements that don't pertain to the graph.
08:57:39 <bwm> answer talk to the app developers
08:59:29 <gk> Misha:  ask for a NOTE: in the document that appicatons must deal with language matching in a sensible way, where appropriate.
08:59:41 <bwm> i18n request a note in the text to suggest that app developers might do other string application matching
09:00:44 <bwm> actual request: requirement is to ensure we don't mislead the app developers into thinking they are not allowed to do fancier string matching.
09:01:07 <bwm> 30 mins to go
09:01:52 <bwm> ontology of language:
09:02:14 <gk> Misha:  Ontology for language issue:  people in charge of language tags periodically change them.  If RDF graphs are suppoosed to be persistent, that's painful.
09:02:30 <gk> ... instead of natural strings, use URIs? 
09:02:36 <bwm> lang tags periodically change, which is a problem if rdf graphs are long lasting; so use a uri for the language, not the lang tag.
09:03:31 <bwm> interesting idea - we'll think about it
09:03:47 <bwm> parseType=Literal
09:04:22 <bwm> in some way we want it to represent the abstract xml between the two tags.
09:04:55 <bwm> we need at least a bit to say that its xml
09:05:17 <bwm> represent the xml with a string that is the canonicalised represention xml
09:07:08 <bwm> the xml parser should do it right so we dont' have to do anything
09:10:17 <bwm> xml lang is inherited in xml - that is correctly handled by rdf/xml translation
09:12:09 <bwm> requirement is the graph round trips.  misha happy.
09:12:29 <bwm> Jeremy discusses complicated xml:lang example from yesterday
09:13:32 <bwm> requirement is that language tags are present where possibel
09:13:55 <gk> Much more important that small strings be language-tagged than larger documents -- language-sniffing techniques don't work in shorter strings
09:13:56 <bwm> language tags more important on small strings cos its harder to 'sniff' the language
09:14:09 <bwm> beat ya
09:14:14 <bwm> just
09:14:53 <bwm> Our propose solution is acceptable
09:15:32 <bwm> how do we decide two parsetype literals are equal
09:17:04 <gk> MartnD:  complete canonicalization is unlikely to be achieved.
09:18:32 <bwm> normalization xml do it
09:18:50 <bwm> lang works ok - we've examined nasty test case
09:18:59 <bwm> equlity is our problem - no internationalization issues
09:19:24 <bwm> n-triples
09:19:45 <bwm> is a format for test cases
09:23:38 <bwm> i18n request that we use same escaping mechanism for iri's as for all other characters.
09:23:49 <bwm> requirement is to recover the iri when its back in the graphp.
09:24:39 <gk> I think the problem here is that URIs/IRIs and string contents are escaped in different ways???
09:24:59 <bwm> problem is that loose form of iri
09:25:33 <gk> Which makes it diffcult to recover the original IRI... but if we used string-escaping style that wouldn't happen.
09:27:06 <gk> Misha has expressed a concern that the N-triple format will "escape" its intended purpose of testing.
09:28:09 <gk> bwm:  putting an I18N burden on working groups' internal tool formats is too much of a buirden.
09:28:17 <bwm> clear health warning this is not a 'public' syntax
09:28:23 <bwm> its not for interoperability
09:29:43 <bwm> misha not happy about escape syntax used
09:29:53 <bwm> its based on python
09:30:45 <bwm> rdfcore used fixed width escape sequence
09:31:24 <bwm> action daveb to check escape sequence range
09:31:30 <bwm> rangs
09:31:32 <bwm> ranges
09:32:04 <bwm> deployment issue is that if we follow charmod we create dependencies on xml guys having implemented it
09:33:10 <bwm> and we will be ready first
09:36:23 <bwm> two issues: holding up candidate rec process - maybe get waiver from director
09:36:35 <bwm> other - dependencies on specs which are not recs
09:36:49 <bwm> suggest we use SHOULD language
09:38:34 <gk> * gk My battery's about to die, I think. ...
09:38:57 <gk> gk has left #rdfcore-i18n
09:40:05 <bwm> bwm has quit