16:52:55 RRSAgent has joined #i18n 16:52:55 logging to http://www.w3.org/2008/07/21-i18n-irc 16:53:44 RRSAgent, set logs world-visible 16:54:03 Meeting: OWL + Internationalization 16:56:11 Axel, could you send a message to everyone saying that we'll use this channel? I have to disconnect my email to use IRC :-( 16:56:56 baojie has joined #i18n 16:57:12 yup 16:58:25 baojie has joined #i18n 16:58:28 SW_(ITST)1:00PM has now started 16:58:35 +??P0 16:58:43 fsasaki has joined #i18n 16:59:21 sandro has joined #i18n 17:00:10 +Addison_Phillips 17:00:49 hi, are we public or member-only here? 17:00:56 fsasaki has left #i18n 17:00:57 public on a # channel 17:01:04 fsasaki has joined #i18n 17:01:08 public 17:01:15 i18n is public... I'm actually asking if we should make minutes public 17:01:22 Ah. 17:01:25 public. 17:01:35 bmotik has joined #i18n 17:01:35 +baojie 17:01:36 RRSAgent, set logs world-visible 17:01:43 RRSAgent, make minutes 17:01:43 I have made the request to generate http://www.w3.org/2008/07/21-i18n-minutes.html aphillip_ 17:01:44 What is the conference code? 17:01:49 4878 17:02:02 +Sandro 17:02:29 +??P3 17:02:30 Jie had collected a list of issues... (which I extend a bit) 17:02:32 Open issues for further discussion include: 17:02:32 * The choice of name space. Alternatives include "rif", "owl", "rdf" or "xsd". Note that the RIF Working Group [7] did not put "rif:text" into the xsd (XML Schema) namespace becuase such a datatype is not considered primitive. 17:02:32 * The construct's name, e.g., "text" or "internationalizedString". 17:02:32 * In language tag pattern matching, whether allow case insensitive matching [8]. 17:02:32 * Whether supersede RFC 3066 with RFC 4646 (Tags for Identifying Languages) 17:02:34 * Shall we do an own datatype hierarchy? 17:02:36 * Should the subtag hierarchy have semantic implications? 17:02:47 +??P8 17:02:53 Zakim, ??P8 is me 17:02:53 +bmotik; got it 17:03:15 zakim, mute me 17:03:15 bmotik should now be muted 17:03:21 Axel: could you paste link your extended issue list here? 17:03:56 I didn't put it online yet, took your list as a basis: 17:03:57 http://www.w3.org/2007/OWL/wiki/InternationalizedString#The_Proposal_owl:langPattern_.28OWL_Working_Group.29 17:05:15 I have made the request to generate http://www.w3.org/2008/07/21-i18n-minutes.html aphillip_ 17:06:03 scribe: Felix 17:06:07 scribeNick: fsasaki 17:06:14 meeting: OWL / i18n meeting 17:06:40 alex: two proposals: 17:06:48 .. how to deal with internationalized text 17:06:58 .. one proposal to have one data type, or a hierarchy of data types 17:07:13 .. this has different implications depending on where we go 17:07:33 .. sub typing would have semantic implication 17:07:50 .. but not sure if the semantic implication is really wanted 17:08:09 .. would "en" vs "en-us" mean that if we query for "en" we also get "en-us"? 17:08:19 .. if we would have a type hierarchy we would get that 17:08:22 q+ 17:08:36 .. even if we do that we need to make clear: how to define the value spaces and lexical spaces 17:09:09 .. if we have just one data type, as a lexical space we have strings with the "@" sign in the language tag 17:09:25 .. with one data type, we would have pairs of language tags and string parts 17:10:04 .. another question whether we go for RFC 4646 or RFC 3066 (which seems to be obsolete) 17:10:16 Zakim, unmute me 17:10:16 bmotik should no longer be muted 17:10:17 .. opinions on what I said? 17:10:36 alex: I think data type hierarchy is feasible, but not sure if it can be done easily 17:10:47 .. not sure how we could have semantic implication of the data type 17:11:02 who is speaking??? 17:11:11 addison: language tag structure is important for operations like matching 17:11:25 .. most W3C technologies are not designed for dealing with structured strings 17:11:42 Zakim, who is on the call? 17:11:42 On the phone I see AxelPolleres, Addison_Phillips, baojie, Sandro, Felix (muted), bmotik 17:11:56 .. if I ask for let's say an i18n string in English, how do I get all English "en", "en-us" etc. with one request 17:12:11 .. if you construct the hierarchy of tags, as with sub tags 17:12:29 .. there is a lot of machinery for a straight forward kind of thing 17:12:43 Also the sub-tags are not the *only* about sub-strings, right? 17:12:47 .. matching algorithm is very simple string matching 17:13:11 xxx: alex said if we don't go for the data type hierarchy we can't have implications 17:13:15 .. I'm not sure about that 17:13:19 s/xxx/boris/ 17:13:27 s/alex/axel/ 17:13:33 .. currently OWL says we need a value space consisting of pairs 17:13:49 .. I think even if we have a hierarchy these strings need to be different 17:14:07 .. if we agree that the value space is a set of pairs (string, string) I don't see a lot of differences 17:14:26 .. we could provide a regex based on the 2nd element of the pair 17:15:05 Addison: concern: regex are good but they are limiting, they don't understand what language tags are about 17:15:29 sandro: Addison said using heavy semantic is overkill, but now you say regex is not enough 17:15:54 Addison: regex is not enough, they could work, but the ones for language tags are a bit complicated 17:16:11 static final String langtag_ex = 17:16:12 "(\\A[xX]([\\x2d]\\p{Alnum}{1,8})*\\z)" 17:16:12 + "|(((\\A\\p{Alpha}{2,8}(?=\\x2d|\\z)){1}" 17:16:12 + "(([\\x2d]\\p{Alpha}{3})(?=\\x2d|\\z)){0,3}" 17:16:12 + "([\\x2d]\\p{Alpha}{4}(?=\\x2d|\\z))?" 17:16:12 + "([\\x2d](\\p{Alpha}{2}|\\d{3})(?=\\x2d|\\z))?" 17:16:14 + "([\\x2d](\\d\\p{Alnum}{3}|\\p{Alnum}{5,8})(?=\\x2d|\\z))*)" 17:16:16 + "(([\\x2d]([a-wyzA-WYZ](?=\\x2d))([\\x2d](\\p{Alnum}{2,8})+)*))*" 17:16:18 + "([\\x2d][xX]([\\x2d]\\p{Alnum}{1,8})*)?)\\z"; 17:17:27 Addison: the structure of language tags with "-" makes it sometimes difficult to use regex 17:17:43 q? 17:17:52 boris: if we can agree on value space I don't see an issue 17:18:05 addison, did I understand correctly that your "exceptions" from the typical subtag pattern (reg-expressions) could be more or less "read off" from http://www.iana.org/assignments/language-subtag-registry 17:18:06 ? 17:18:28 sandro: think that the pair is the right way to go 17:18:46 axel: in the pair would "en" and "en-us" be disjoint? 17:18:51 axel, exceptions are a very small list in the registry (or in rfc) 17:19:07 boris: we are dealing with language of content here 17:19:24 .. if you talk about values you have to distinguish "en" and "en-us" 17:19:55 .. you could have a data type which is called lang-en which includes all values of "en", but that's a "class" thing 17:20:43 boris: first question is whether we deal with one or two values, hierarchy is secondary question 17:21:33 .. you can apply a function saying "give me all chinese tags" 17:21:50 .. there is no need to put that into the semantics of the types, but have this in the built-in functions 17:22:11 axel: do you deal in RIF with data types and / or facets? 17:22:19 boris: what do you mean by facets here? 17:22:40 axel: we use facets e.g. from XML Schema, to create facets for e.g. integer 17:22:46 .. how do you do that in RIF? 17:22:59 boris: in RIF so far we only define a basic set of data types 17:23:14 .. we did not consider data type restrictions, facets at all yet 17:23:32 axel: something to start working from: we could agree on one data type (both working group) 17:23:41 .. e.g. "internationalized string" 17:23:47 felix, it was the other way felix <-> axel 17:23:55 .. value set is a set of pairs (string, string) 17:23:58 s/felix/boris/ 17:24:27 s/axel: something/boris: something/ 17:24:51 Addison: question is how to deal with semantics of 2nd string 17:25:15 s/boris: in RIF/axel: in RIF/ 17:25:22 .. XML Schema has a type "xs:language" 17:25:42 .. that is a string, it can be used to represent xml:lang 17:25:55 please paste the link 17:26:06 (just for completeness) 17:26:13 http://www.w3.org/TR/xmlschema11-2/#language 17:26:58 boris: agree, would be better to refer to the language tag standard, RFC 3066 17:27:04 addison: better BCP 47 17:27:18 boris: afraid of referring to a moving target 17:27:56 addison: we (i18n core) we have dealt with this elsewhere 17:27:59 s/boris/axel/ 17:28:09 .. see e.g. XML Schema saying "RFC 3066 or its successor" 17:28:19 sandro: sounds OK to me 17:28:35 yyy: is an empty language tag valid? 17:28:54 addison: it's not a valid language tag, but can be used e.g. in xm:lang 17:28:57 s/yyy/boris/ 17:29:09 s/xm:lang/xml:lang/ 17:29:38 boris: Don't think that rdf:lang is effected here 17:29:56 s/rdf:lang/rdf land/ 17:30:22 agreement 17:30:33 boris: the only thing rdf needs is that (x, "") be distinct from all (x, *) 17:30:43 (where * is not "") 17:30:53 boris' proposal to include xs:string as a special case sounds fine to me. 17:31:54 addison: can what you want be described as a standard form of language tag matching? 17:31:57 boris: yes 17:32:20 addison: BCP 47 has three algorithms for matching 17:32:26 .. these include how to provide a list 17:32:37 boris: as long as you can use some regex we could be fine 17:32:57 http://www.ietf.org/rfc/rfc4647.txt : extended filtering 17:33:10 addison: your approach is similar to extended filtering 17:33:11 http://www.inter-locale.com/ID/draft-ietf-ltru-matching-15.html#extMatching 17:33:13 boris, you mean e.g. that lang:en would be a subtype of (internationalized) string that covers all those with langtag en plus its subtags? 17:34:57 boris: basic internationlized string data type would allow to implement something like that on top of this 17:35:21 .. in OWL it may be not so easy since you are quantifying over ranges 17:35:31 .. that might not be decideable , need to check expressibility 17:35:49 addision: yes, and language tags provide ways to deal with that 17:36:08 .. it's more complicated than other types like integer 17:36:39 alex: confirm boris proposal: have one type "i18n string" 17:36:51 .. if you want to have a type that covers all English strings, that would be a sub type? 17:36:53 boris: yes 17:37:15 .. that would be a sub set of the general value space 17:37:49 sandro: we get the same functionality, no matter if we do one data type or one per language tag? 17:38:10 .. the lexical spaces would be different 17:38:32 boris: the value spaces are sub sets, that is most interesting 17:38:32 basically, if i understand correctly, bortis says, we can do the single datatype on top, and specify the type hierarchy below it afterwars. 17:39:08 alex: about boris' earlier proposal to include "string" in here: 17:39:27 .. not sure how we could distinguish "string" from language tag "en" string 17:39:41 boris: for OWL "i18n" string, to be able to embedd it into RDF 17:39:47 .. we need a unique lexical representation 17:39:59 .. proposal from OWL WG is: 17:40:08 in (simple) RDF, btw "blabla" is different from "blabla"^^xs:string 17:40:10 .. lexical space of i18n string: text of string, "@", language tag 17:40:33 String "abc" without langTag is "abc@"^^owl:internationalizedString 17:41:10 "abc"^^xsd:string 17:41:28 is equivalent to the previous one 17:41:29 and we' define xs:string as a sybtype of internat.string which has exactly that valuespace? ... yest 17:42:01 "abc"@en 17:42:02 boris: obviously the lexical representation is not equivalent 17:42:10 What is "abc"@en? 17:42:33 boris: OWL WG says this is a syntactic shortcut for: 17:42:44 IT is a syntacitc shortcut for "abc@en"^^owl:internationalizedString 17:43:06 Yup, we had talked about this shortcut in RIF, BTW, but not yet approved it. 17:43:08 boris: this is to be compatible with RDF and the representation syntax 17:43:19 You define "abc" as a syntacitc shortcut for "abc@"^^owl:internationalizedString 17:44:24 boris: internally you can say that all literals have this structure 17:45:47 sandro: having the "@" sign in the string is kind of a hack 17:46:08 .. it technically works but I'm worried that it is pretty ugly 17:46:25 boris: that is the reason why we have the syntactic shortcut 17:46:49 sandro: in the examples we use the shortcut, but the tools may or may not use the shortcut 17:46:50 people aren't supposed to use it (just like people aren't supposed to use "a"^^rif:iri ... or no?) 17:46:56 "abc"^^lang:en 17:47:02 boris: otherwise you would always have to write the above 17:47:12 "chat"@en ==="chat"^^lang:en 17:47:38 instead of "chat@en"^^owl:internationalizedString 17:48:07 boris: do you agree that we still define the whole value space without the lexical space 17:48:21 sandro: seems fine by me, sounds like something which will not be serialized 17:48:42 boris: you could use owl:internationizedString 17:49:17 .. we could call it "text" 17:49:21 addison: better name 17:49:36 .. using "internationalized" sounds that other strings are not internationalized 17:49:38 text is simpler, agreement, it seems. 17:50:05 s/text/"text"/ 17:50:34 Addison: better not to introduce an artifical distinction of strings if its not necessary 17:51:49 sandro: I'm OK with the @ sign, the other proposal looked nicer 17:52:08 axel: boris convinced me that we need the value space to treat all values differently 17:52:25 .. we would not get that by type hierarchy with "lang" 17:53:02 basically... from before: "basically, if i understand correctly, bortis says, we can do the single datatype on top, and specify the type hierarchy below it afterwars." 17:53:25 ..." boris, you mean e.g. that lang:en would be a subtype of (internationalized) string that covers all those with langtag en plus its subtags?" 17:54:13 lang:en *could* be defined by a reg exp. 17:54:49 boris: using lang "en" in the lexical space might be confusing 17:55:47 addison: yes, in the matching document RFC 4647 you have two names: the "sub tag" and the "range" which says "the thing a tag starts with" 17:56:35 boris: lang data type is rather used for querying 17:56:50 .. by not allowing a particular lexical representation we would make this clear 17:57:21 addison: that gets you out of the problem that in language ranges you can have "*", but not in language tags 17:57:41 The other extreme would be that "abc"^^lang:en is indeed the same as "abc"^^lang:en-us... just like "1.0"^^xs:decimal is indeed the same as "1"^^xs:integer ... thes seems to be not wanted, yes? 17:57:46 boris: we could put some strong wording to the spec to make this clear 17:59:28 boris: we need to specify what the allowed items are 17:59:29 extended-language-range = (1*8ALPHA / "*") 17:59:29 *("-" (1*8alphanum / "*")) 17:59:44 addison: I would go to extended language range from RF 4647, see above 17:59:53 .. reference those for matching 18:00:52 I have made the request to generate http://www.w3.org/2008/07/21-i18n-minutes.html aphillip_ 18:01:25 -Felix 18:01:29 scribe: Sandro 18:01:36 scribenick: sandro 18:02:03 Addison: If you just say it's a string, then you're going to not be helping people very much. There is infrastructure for this, and it would be good to reference that. 18:02:21 What I mean is "lang:en" is then no longer a datatype, but a built-in function. 18:02:36 Boris: Axel is saying that *matching* is a more appropriate operation on this data type -- a builting in RIF, a facet in OWL, that refer to RFC 4647. 18:04:09 Boris: In OWL, these facets are relatively easy. For strings you have regexp pattern facets, restricting the string. We could easily introduce a language-range facet, which is a query in this RFC 4647 language. All pairs which match this query. RIF could have a similar built-in function. 18:04:41 Axel: Our conclusion -- real datatype hierarchy is no practical. Given that, this all sounds fine. 18:05:24 sandro: My inclination is rdf: as the prefix. 18:05:26 rdf:text? 18:05:27 +1 18:05:29 boris: I don't care. 18:05:37 I will summarize the recent emails and the discussion in an update document 18:05:37 Addison: sounds good to me. 18:06:03 boris said: I' fine, I don't care (slight difference) 18:06:07 :-) 18:06:56 http://www.w3.org/2007/OWL/wiki/InternationalizedString# 18:07:42 http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec 18:08:01 ok 18:08:34 RRSAgent, make minutes 18:08:34 I have made the request to generate http://www.w3.org/2008/07/21-i18n-minutes.html aphillip_ 18:09:01 -Addison_Phillips 18:09:02 -Sandro 18:09:02 -baojie 18:09:04 -bmotik 18:10:00 -AxelPolleres 18:10:01 SW_(ITST)1:00PM has ended 18:10:03 Attendees were AxelPolleres, Addison_Phillips, baojie, Sandro, Felix, bmotik 18:11:45 aphillip_ has left #i18n 20:19:14 Zakim has left #i18n