IRC log of i18n on 2008-07-21

Timestamps are in UTC.

16:52:55 [RRSAgent]
RRSAgent has joined #i18n
16:52:55 [RRSAgent]
logging to
16:53:44 [aphillip_]
RRSAgent, set logs world-visible
16:54:03 [aphillip_]
Meeting: OWL + Internationalization
16:56:11 [aphillip_]
Axel, could you send a message to everyone saying that we'll use this channel? I have to disconnect my email to use IRC :-(
16:56:56 [baojie]
baojie has joined #i18n
16:57:12 [AxelPolleres]
16:58:25 [baojie]
baojie has joined #i18n
16:58:28 [Zakim]
SW_(ITST)1:00PM has now started
16:58:35 [Zakim]
16:58:43 [fsasaki]
fsasaki has joined #i18n
16:59:21 [sandro]
sandro has joined #i18n
17:00:10 [Zakim]
17:00:49 [aphillip_]
hi, are we public or member-only here?
17:00:56 [fsasaki]
fsasaki has left #i18n
17:00:57 [sandro]
public on a # channel
17:01:04 [fsasaki]
fsasaki has joined #i18n
17:01:08 [fsasaki]
17:01:15 [aphillip_]
i18n is public... I'm actually asking if we should make minutes public
17:01:22 [sandro]
17:01:25 [sandro]
17:01:35 [bmotik]
bmotik has joined #i18n
17:01:35 [Zakim]
17:01:36 [aphillip_]
RRSAgent, set logs world-visible
17:01:43 [aphillip_]
RRSAgent, make minutes
17:01:43 [RRSAgent]
I have made the request to generate aphillip_
17:01:44 [bmotik]
What is the conference code?
17:01:49 [aphillip_]
17:02:02 [Zakim]
17:02:29 [Zakim]
17:02:30 [AxelPolleres]
Jie had collected a list of issues... (which I extend a bit)
17:02:32 [AxelPolleres]
Open issues for further discussion include:
17:02:32 [AxelPolleres]
* The choice of name space. Alternatives include "rif", "owl", "rdf" or "xsd". Note that the RIF Working Group [7] did not put "rif:text" into the xsd (XML Schema) namespace becuase such a datatype is not considered primitive.
17:02:32 [AxelPolleres]
* The construct's name, e.g., "text" or "internationalizedString".
17:02:32 [AxelPolleres]
* In language tag pattern matching, whether allow case insensitive matching [8].
17:02:32 [AxelPolleres]
* Whether supersede RFC 3066 with RFC 4646 (Tags for Identifying Languages)
17:02:34 [AxelPolleres]
* Shall we do an own datatype hierarchy?
17:02:36 [AxelPolleres]
* Should the subtag hierarchy have semantic implications?
17:02:47 [Zakim]
17:02:53 [bmotik]
Zakim, ??P8 is me
17:02:53 [Zakim]
+bmotik; got it
17:03:15 [bmotik]
zakim, mute me
17:03:15 [Zakim]
bmotik should now be muted
17:03:21 [baojie]
Axel: could you paste link your extended issue list here?
17:03:56 [AxelPolleres]
I didn't put it online yet, took your list as a basis:
17:03:57 [AxelPolleres]
17:05:15 [RRSAgent]
I have made the request to generate aphillip_
17:06:03 [fsasaki]
scribe: Felix
17:06:07 [fsasaki]
scribeNick: fsasaki
17:06:14 [fsasaki]
meeting: OWL / i18n meeting
17:06:40 [fsasaki]
alex: two proposals:
17:06:48 [fsasaki]
.. how to deal with internationalized text
17:06:58 [fsasaki]
.. one proposal to have one data type, or a hierarchy of data types
17:07:13 [fsasaki]
.. this has different implications depending on where we go
17:07:33 [fsasaki]
.. sub typing would have semantic implication
17:07:50 [fsasaki]
.. but not sure if the semantic implication is really wanted
17:08:09 [fsasaki]
.. would "en" vs "en-us" mean that if we query for "en" we also get "en-us"?
17:08:19 [fsasaki]
.. if we would have a type hierarchy we would get that
17:08:22 [bmotik]
17:08:36 [fsasaki]
.. even if we do that we need to make clear: how to define the value spaces and lexical spaces
17:09:09 [fsasaki]
.. if we have just one data type, as a lexical space we have strings with the "@" sign in the language tag
17:09:25 [fsasaki]
.. with one data type, we would have pairs of language tags and string parts
17:10:04 [fsasaki]
.. another question whether we go for RFC 4646 or RFC 3066 (which seems to be obsolete)
17:10:16 [bmotik]
Zakim, unmute me
17:10:16 [Zakim]
bmotik should no longer be muted
17:10:17 [fsasaki]
.. opinions on what I said?
17:10:36 [fsasaki]
alex: I think data type hierarchy is feasible, but not sure if it can be done easily
17:10:47 [fsasaki]
.. not sure how we could have semantic implication of the data type
17:11:02 [AxelPolleres]
who is speaking???
17:11:11 [fsasaki]
addison: language tag structure is important for operations like matching
17:11:25 [fsasaki]
.. most W3C technologies are not designed for dealing with structured strings
17:11:42 [sandro]
Zakim, who is on the call?
17:11:42 [Zakim]
On the phone I see AxelPolleres, Addison_Phillips, baojie, Sandro, Felix (muted), bmotik
17:11:56 [fsasaki]
.. if I ask for let's say an i18n string in English, how do I get all English "en", "en-us" etc. with one request
17:12:11 [fsasaki]
.. if you construct the hierarchy of tags, as with sub tags
17:12:29 [fsasaki]
.. there is a lot of machinery for a straight forward kind of thing
17:12:43 [AxelPolleres]
Also the sub-tags are not the *only* about sub-strings, right?
17:12:47 [fsasaki]
.. matching algorithm is very simple string matching
17:13:11 [fsasaki]
xxx: alex said if we don't go for the data type hierarchy we can't have implications
17:13:15 [fsasaki]
.. I'm not sure about that
17:13:19 [AxelPolleres]
17:13:27 [AxelPolleres]
17:13:33 [fsasaki]
.. currently OWL says we need a value space consisting of pairs
17:13:49 [fsasaki]
.. I think even if we have a hierarchy these strings need to be different
17:14:07 [fsasaki]
.. if we agree that the value space is a set of pairs (string, string) I don't see a lot of differences
17:14:26 [fsasaki]
.. we could provide a regex based on the 2nd element of the pair
17:15:05 [fsasaki]
Addison: concern: regex are good but they are limiting, they don't understand what language tags are about
17:15:29 [fsasaki]
sandro: Addison said using heavy semantic is overkill, but now you say regex is not enough
17:15:54 [fsasaki]
Addison: regex is not enough, they could work, but the ones for language tags are a bit complicated
17:16:11 [aphillip_]
static final String langtag_ex =
17:16:12 [aphillip_]
17:16:12 [aphillip_]
+ "|(((\\A\\p{Alpha}{2,8}(?=\\x2d|\\z)){1}"
17:16:12 [aphillip_]
+ "(([\\x2d]\\p{Alpha}{3})(?=\\x2d|\\z)){0,3}"
17:16:12 [aphillip_]
+ "([\\x2d]\\p{Alpha}{4}(?=\\x2d|\\z))?"
17:16:12 [aphillip_]
+ "([\\x2d](\\p{Alpha}{2}|\\d{3})(?=\\x2d|\\z))?"
17:16:14 [aphillip_]
+ "([\\x2d](\\d\\p{Alnum}{3}|\\p{Alnum}{5,8})(?=\\x2d|\\z))*)"
17:16:16 [aphillip_]
+ "(([\\x2d]([a-wyzA-WYZ](?=\\x2d))([\\x2d](\\p{Alnum}{2,8})+)*))*"
17:16:18 [aphillip_]
+ "([\\x2d][xX]([\\x2d]\\p{Alnum}{1,8})*)?)\\z";
17:17:27 [fsasaki]
Addison: the structure of language tags with "-" makes it sometimes difficult to use regex
17:17:43 [sandro]
17:17:52 [fsasaki]
boris: if we can agree on value space I don't see an issue
17:18:05 [AxelPolleres]
addison, did I understand correctly that your "exceptions" from the typical subtag pattern (reg-expressions) could be more or less "read off" from
17:18:06 [AxelPolleres]
17:18:28 [fsasaki]
sandro: think that the pair is the right way to go
17:18:46 [fsasaki]
axel: in the pair would "en" and "en-us" be disjoint?
17:18:51 [aphillip_]
axel, exceptions are a very small list in the registry (or in rfc)
17:19:07 [fsasaki]
boris: we are dealing with language of content here
17:19:24 [fsasaki]
.. if you talk about values you have to distinguish "en" and "en-us"
17:19:55 [fsasaki]
.. you could have a data type which is called lang-en which includes all values of "en", but that's a "class" thing
17:20:43 [fsasaki]
boris: first question is whether we deal with one or two values, hierarchy is secondary question
17:21:33 [fsasaki]
.. you can apply a function saying "give me all chinese tags"
17:21:50 [fsasaki]
.. there is no need to put that into the semantics of the types, but have this in the built-in functions
17:22:11 [fsasaki]
axel: do you deal in RIF with data types and / or facets?
17:22:19 [fsasaki]
boris: what do you mean by facets here?
17:22:40 [fsasaki]
axel: we use facets e.g. from XML Schema, to create facets for e.g. integer
17:22:46 [fsasaki]
.. how do you do that in RIF?
17:22:59 [fsasaki]
boris: in RIF so far we only define a basic set of data types
17:23:14 [fsasaki]
.. we did not consider data type restrictions, facets at all yet
17:23:32 [fsasaki]
axel: something to start working from: we could agree on one data type (both working group)
17:23:41 [fsasaki]
.. e.g. "internationalized string"
17:23:47 [AxelPolleres]
felix, it was the other way felix <-> axel
17:23:55 [fsasaki]
.. value set is a set of pairs (string, string)
17:23:58 [AxelPolleres]
17:24:27 [fsasaki]
s/axel: something/boris: something/
17:24:51 [fsasaki]
Addison: question is how to deal with semantics of 2nd string
17:25:15 [AxelPolleres]
s/boris: in RIF/axel: in RIF/
17:25:22 [fsasaki]
.. XML Schema has a type "xs:language"
17:25:42 [fsasaki]
.. that is a string, it can be used to represent xml:lang
17:25:55 [AxelPolleres]
please paste the link
17:26:06 [AxelPolleres]
(just for completeness)
17:26:13 [aphillip_]
17:26:58 [fsasaki]
boris: agree, would be better to refer to the language tag standard, RFC 3066
17:27:04 [fsasaki]
addison: better BCP 47
17:27:18 [fsasaki]
boris: afraid of referring to a moving target
17:27:56 [fsasaki]
addison: we (i18n core) we have dealt with this elsewhere
17:27:59 [AxelPolleres]
17:28:09 [fsasaki]
.. see e.g. XML Schema saying "RFC 3066 or its successor"
17:28:19 [fsasaki]
sandro: sounds OK to me
17:28:35 [fsasaki]
yyy: is an empty language tag valid?
17:28:54 [fsasaki]
addison: it's not a valid language tag, but can be used e.g. in xm:lang
17:28:57 [AxelPolleres]
17:29:09 [fsasaki]
17:29:38 [fsasaki]
boris: Don't think that rdf:lang is effected here
17:29:56 [sandro]
s/rdf:lang/rdf land/
17:30:22 [fsasaki]
17:30:33 [sandro]
boris: the only thing rdf needs is that (x, "") be distinct from all (x, *)
17:30:43 [sandro]
(where * is not "")
17:30:53 [AxelPolleres]
boris' proposal to include xs:string as a special case sounds fine to me.
17:31:54 [fsasaki]
addison: can what you want be described as a standard form of language tag matching?
17:31:57 [fsasaki]
boris: yes
17:32:20 [fsasaki]
addison: BCP 47 has three algorithms for matching
17:32:26 [fsasaki]
.. these include how to provide a list
17:32:37 [fsasaki]
boris: as long as you can use some regex we could be fine
17:32:57 [aphillip_] : extended filtering
17:33:10 [fsasaki]
addison: your approach is similar to extended filtering
17:33:11 [aphillip_]
17:33:13 [AxelPolleres]
boris, you mean e.g. that lang:en would be a subtype of (internationalized) string that covers all those with langtag en plus its subtags?
17:34:57 [fsasaki]
boris: basic internationlized string data type would allow to implement something like that on top of this
17:35:21 [fsasaki]
.. in OWL it may be not so easy since you are quantifying over ranges
17:35:31 [fsasaki]
.. that might not be decideable , need to check expressibility
17:35:49 [fsasaki]
addision: yes, and language tags provide ways to deal with that
17:36:08 [fsasaki]
.. it's more complicated than other types like integer
17:36:39 [fsasaki]
alex: confirm boris proposal: have one type "i18n string"
17:36:51 [fsasaki]
.. if you want to have a type that covers all English strings, that would be a sub type?
17:36:53 [fsasaki]
boris: yes
17:37:15 [fsasaki]
.. that would be a sub set of the general value space
17:37:49 [fsasaki]
sandro: we get the same functionality, no matter if we do one data type or one per language tag?
17:38:10 [fsasaki]
.. the lexical spaces would be different
17:38:32 [fsasaki]
boris: the value spaces are sub sets, that is most interesting
17:38:32 [AxelPolleres]
basically, if i understand correctly, bortis says, we can do the single datatype on top, and specify the type hierarchy below it afterwars.
17:39:08 [fsasaki]
alex: about boris' earlier proposal to include "string" in here:
17:39:27 [fsasaki]
.. not sure how we could distinguish "string" from language tag "en" string
17:39:41 [fsasaki]
boris: for OWL "i18n" string, to be able to embedd it into RDF
17:39:47 [fsasaki]
.. we need a unique lexical representation
17:39:59 [fsasaki]
.. proposal from OWL WG is:
17:40:08 [AxelPolleres]
in (simple) RDF, btw "blabla" is different from "blabla"^^xs:string
17:40:10 [fsasaki]
.. lexical space of i18n string: text of string, "@", language tag
17:40:33 [bmotik]
String "abc" without langTag is "abc@"^^owl:internationalizedString
17:41:10 [bmotik]
17:41:28 [bmotik]
is equivalent to the previous one
17:41:29 [AxelPolleres]
and we' define xs:string as a sybtype of internat.string which has exactly that valuespace? ... yest
17:42:01 [bmotik]
17:42:02 [fsasaki]
boris: obviously the lexical representation is not equivalent
17:42:10 [bmotik]
What is "abc"@en?
17:42:33 [fsasaki]
boris: OWL WG says this is a syntactic shortcut for:
17:42:44 [bmotik]
IT is a syntacitc shortcut for "abc@en"^^owl:internationalizedString
17:43:06 [AxelPolleres]
Yup, we had talked about this shortcut in RIF, BTW, but not yet approved it.
17:43:08 [fsasaki]
boris: this is to be compatible with RDF and the representation syntax
17:43:19 [bmotik]
You define "abc" as a syntacitc shortcut for "abc@"^^owl:internationalizedString
17:44:24 [fsasaki]
boris: internally you can say that all literals have this structure
17:45:47 [fsasaki]
sandro: having the "@" sign in the string is kind of a hack
17:46:08 [fsasaki]
.. it technically works but I'm worried that it is pretty ugly
17:46:25 [fsasaki]
boris: that is the reason why we have the syntactic shortcut
17:46:49 [fsasaki]
sandro: in the examples we use the shortcut, but the tools may or may not use the shortcut
17:46:50 [AxelPolleres]
people aren't supposed to use it (just like people aren't supposed to use "a"^^rif:iri ... or no?)
17:46:56 [bmotik]
17:47:02 [fsasaki]
boris: otherwise you would always have to write the above
17:47:12 [sandro]
"chat"@en ==="chat"^^lang:en
17:47:38 [sandro]
instead of "chat@en"^^owl:internationalizedString
17:48:07 [fsasaki]
boris: do you agree that we still define the whole value space without the lexical space
17:48:21 [fsasaki]
sandro: seems fine by me, sounds like something which will not be serialized
17:48:42 [fsasaki]
boris: you could use owl:internationizedString
17:49:17 [fsasaki]
.. we could call it "text"
17:49:21 [fsasaki]
addison: better name
17:49:36 [fsasaki]
.. using "internationalized" sounds that other strings are not internationalized
17:49:38 [AxelPolleres]
text is simpler, agreement, it seems.
17:50:05 [AxelPolleres]
17:50:34 [fsasaki]
Addison: better not to introduce an artifical distinction of strings if its not necessary
17:51:49 [fsasaki]
sandro: I'm OK with the @ sign, the other proposal looked nicer
17:52:08 [fsasaki]
axel: boris convinced me that we need the value space to treat all values differently
17:52:25 [fsasaki]
.. we would not get that by type hierarchy with "lang"
17:53:02 [AxelPolleres]
basically... from before: "basically, if i understand correctly, bortis says, we can do the single datatype on top, and specify the type hierarchy below it afterwars."
17:53:25 [AxelPolleres]
..." boris, you mean e.g. that lang:en would be a subtype of (internationalized) string that covers all those with langtag en plus its subtags?"
17:54:13 [AxelPolleres]
lang:en *could* be defined by a reg exp.
17:54:49 [fsasaki]
boris: using lang "en" in the lexical space might be confusing
17:55:47 [fsasaki]
addison: yes, in the matching document RFC 4647 you have two names: the "sub tag" and the "range" which says "the thing a tag starts with"
17:56:35 [fsasaki]
boris: lang data type is rather used for querying
17:56:50 [fsasaki]
.. by not allowing a particular lexical representation we would make this clear
17:57:21 [fsasaki]
addison: that gets you out of the problem that in language ranges you can have "*", but not in language tags
17:57:41 [AxelPolleres]
The other extreme would be that "abc"^^lang:en is indeed the same as "abc"^^lang:en-us... just like "1.0"^^xs:decimal is indeed the same as "1"^^xs:integer ... thes seems to be not wanted, yes?
17:57:46 [fsasaki]
boris: we could put some strong wording to the spec to make this clear
17:59:28 [fsasaki]
boris: we need to specify what the allowed items are
17:59:29 [aphillip_]
extended-language-range = (1*8ALPHA / "*")
17:59:29 [aphillip_]
*("-" (1*8alphanum / "*"))
17:59:44 [fsasaki]
addison: I would go to extended language range from RF 4647, see above
17:59:53 [fsasaki]
.. reference those for matching
18:00:52 [RRSAgent]
I have made the request to generate aphillip_
18:01:25 [Zakim]
18:01:29 [sandro]
scribe: Sandro
18:01:36 [sandro]
scribenick: sandro
18:02:03 [sandro]
Addison: If you just say it's a string, then you're going to not be helping people very much. There is infrastructure for this, and it would be good to reference that.
18:02:21 [AxelPolleres]
What I mean is "lang:en" is then no longer a datatype, but a built-in function.
18:02:36 [sandro]
Boris: Axel is saying that *matching* is a more appropriate operation on this data type -- a builting in RIF, a facet in OWL, that refer to RFC 4647.
18:04:09 [sandro]
Boris: In OWL, these facets are relatively easy. For strings you have regexp pattern facets, restricting the string. We could easily introduce a language-range facet, which is a query in this RFC 4647 language. All pairs which match this query. RIF could have a similar built-in function.
18:04:41 [sandro]
Axel: Our conclusion -- real datatype hierarchy is no practical. Given that, this all sounds fine.
18:05:24 [sandro]
sandro: My inclination is rdf: as the prefix.
18:05:26 [AxelPolleres]
18:05:27 [AxelPolleres]
18:05:29 [sandro]
boris: I don't care.
18:05:37 [baojie]
I will summarize the recent emails and the discussion in an update document
18:05:37 [sandro]
Addison: sounds good to me.
18:06:03 [AxelPolleres]
boris said: I' fine, I don't care (slight difference)
18:06:07 [AxelPolleres]
18:06:56 [baojie]
18:07:42 [baojie]
18:08:01 [baojie]
18:08:34 [aphillip_]
RRSAgent, make minutes
18:08:34 [RRSAgent]
I have made the request to generate aphillip_
18:09:01 [Zakim]
18:09:02 [Zakim]
18:09:02 [Zakim]
18:09:04 [Zakim]
18:10:00 [Zakim]
18:10:01 [Zakim]
SW_(ITST)1:00PM has ended
18:10:03 [Zakim]
Attendees were AxelPolleres, Addison_Phillips, baojie, Sandro, Felix, bmotik
18:11:45 [aphillip_]
aphillip_ has left #i18n
20:19:14 [Zakim]
Zakim has left #i18n