IRC log of i18n on 2008-07-21
Timestamps are in UTC.
- 16:52:55 [RRSAgent]
- RRSAgent has joined #i18n
- 16:52:55 [RRSAgent]
- logging to http://www.w3.org/2008/07/21-i18n-irc
- 16:53:44 [aphillip_]
- RRSAgent, set logs world-visible
- 16:54:03 [aphillip_]
- Meeting: OWL + Internationalization
- 16:56:11 [aphillip_]
- Axel, could you send a message to everyone saying that we'll use this channel? I have to disconnect my email to use IRC :-(
- 16:56:56 [baojie]
- baojie has joined #i18n
- 16:57:12 [AxelPolleres]
- yup
- 16:58:25 [baojie]
- baojie has joined #i18n
- 16:58:28 [Zakim]
- SW_(ITST)1:00PM has now started
- 16:58:35 [Zakim]
- +??P0
- 16:58:43 [fsasaki]
- fsasaki has joined #i18n
- 16:59:21 [sandro]
- sandro has joined #i18n
- 17:00:10 [Zakim]
- +Addison_Phillips
- 17:00:49 [aphillip_]
- hi, are we public or member-only here?
- 17:00:56 [fsasaki]
- fsasaki has left #i18n
- 17:00:57 [sandro]
- public on a # channel
- 17:01:04 [fsasaki]
- fsasaki has joined #i18n
- 17:01:08 [fsasaki]
- public
- 17:01:15 [aphillip_]
- i18n is public... I'm actually asking if we should make minutes public
- 17:01:22 [sandro]
- Ah.
- 17:01:25 [sandro]
- public.
- 17:01:35 [bmotik]
- bmotik has joined #i18n
- 17:01:35 [Zakim]
- +baojie
- 17:01:36 [aphillip_]
- RRSAgent, set logs world-visible
- 17:01:43 [aphillip_]
- RRSAgent, make minutes
- 17:01:43 [RRSAgent]
- I have made the request to generate http://www.w3.org/2008/07/21-i18n-minutes.html aphillip_
- 17:01:44 [bmotik]
- What is the conference code?
- 17:01:49 [aphillip_]
- 4878
- 17:02:02 [Zakim]
- +Sandro
- 17:02:29 [Zakim]
- +??P3
- 17:02:30 [AxelPolleres]
- Jie had collected a list of issues... (which I extend a bit)
- 17:02:32 [AxelPolleres]
- Open issues for further discussion include:
- 17:02:32 [AxelPolleres]
- * The choice of name space. Alternatives include "rif", "owl", "rdf" or "xsd". Note that the RIF Working Group [7] did not put "rif:text" into the xsd (XML Schema) namespace becuase such a datatype is not considered primitive.
- 17:02:32 [AxelPolleres]
- * The construct's name, e.g., "text" or "internationalizedString".
- 17:02:32 [AxelPolleres]
- * In language tag pattern matching, whether allow case insensitive matching [8].
- 17:02:32 [AxelPolleres]
- * Whether supersede RFC 3066 with RFC 4646 (Tags for Identifying Languages)
- 17:02:34 [AxelPolleres]
- * Shall we do an own datatype hierarchy?
- 17:02:36 [AxelPolleres]
- * Should the subtag hierarchy have semantic implications?
- 17:02:47 [Zakim]
- +??P8
- 17:02:53 [bmotik]
- Zakim, ??P8 is me
- 17:02:53 [Zakim]
- +bmotik; got it
- 17:03:15 [bmotik]
- zakim, mute me
- 17:03:15 [Zakim]
- bmotik should now be muted
- 17:03:21 [baojie]
- Axel: could you paste link your extended issue list here?
- 17:03:56 [AxelPolleres]
- I didn't put it online yet, took your list as a basis:
- 17:03:57 [AxelPolleres]
- http://www.w3.org/2007/OWL/wiki/InternationalizedString#The_Proposal_owl:langPattern_.28OWL_Working_Group.29
- 17:05:15 [RRSAgent]
- I have made the request to generate http://www.w3.org/2008/07/21-i18n-minutes.html aphillip_
- 17:06:03 [fsasaki]
- scribe: Felix
- 17:06:07 [fsasaki]
- scribeNick: fsasaki
- 17:06:14 [fsasaki]
- meeting: OWL / i18n meeting
- 17:06:40 [fsasaki]
- alex: two proposals:
- 17:06:48 [fsasaki]
- .. how to deal with internationalized text
- 17:06:58 [fsasaki]
- .. one proposal to have one data type, or a hierarchy of data types
- 17:07:13 [fsasaki]
- .. this has different implications depending on where we go
- 17:07:33 [fsasaki]
- .. sub typing would have semantic implication
- 17:07:50 [fsasaki]
- .. but not sure if the semantic implication is really wanted
- 17:08:09 [fsasaki]
- .. would "en" vs "en-us" mean that if we query for "en" we also get "en-us"?
- 17:08:19 [fsasaki]
- .. if we would have a type hierarchy we would get that
- 17:08:22 [bmotik]
- q+
- 17:08:36 [fsasaki]
- .. even if we do that we need to make clear: how to define the value spaces and lexical spaces
- 17:09:09 [fsasaki]
- .. if we have just one data type, as a lexical space we have strings with the "@" sign in the language tag
- 17:09:25 [fsasaki]
- .. with one data type, we would have pairs of language tags and string parts
- 17:10:04 [fsasaki]
- .. another question whether we go for RFC 4646 or RFC 3066 (which seems to be obsolete)
- 17:10:16 [bmotik]
- Zakim, unmute me
- 17:10:16 [Zakim]
- bmotik should no longer be muted
- 17:10:17 [fsasaki]
- .. opinions on what I said?
- 17:10:36 [fsasaki]
- alex: I think data type hierarchy is feasible, but not sure if it can be done easily
- 17:10:47 [fsasaki]
- .. not sure how we could have semantic implication of the data type
- 17:11:02 [AxelPolleres]
- who is speaking???
- 17:11:11 [fsasaki]
- addison: language tag structure is important for operations like matching
- 17:11:25 [fsasaki]
- .. most W3C technologies are not designed for dealing with structured strings
- 17:11:42 [sandro]
- Zakim, who is on the call?
- 17:11:42 [Zakim]
- On the phone I see AxelPolleres, Addison_Phillips, baojie, Sandro, Felix (muted), bmotik
- 17:11:56 [fsasaki]
- .. if I ask for let's say an i18n string in English, how do I get all English "en", "en-us" etc. with one request
- 17:12:11 [fsasaki]
- .. if you construct the hierarchy of tags, as with sub tags
- 17:12:29 [fsasaki]
- .. there is a lot of machinery for a straight forward kind of thing
- 17:12:43 [AxelPolleres]
- Also the sub-tags are not the *only* about sub-strings, right?
- 17:12:47 [fsasaki]
- .. matching algorithm is very simple string matching
- 17:13:11 [fsasaki]
- xxx: alex said if we don't go for the data type hierarchy we can't have implications
- 17:13:15 [fsasaki]
- .. I'm not sure about that
- 17:13:19 [AxelPolleres]
- s/xxx/boris/
- 17:13:27 [AxelPolleres]
- s/alex/axel/
- 17:13:33 [fsasaki]
- .. currently OWL says we need a value space consisting of pairs
- 17:13:49 [fsasaki]
- .. I think even if we have a hierarchy these strings need to be different
- 17:14:07 [fsasaki]
- .. if we agree that the value space is a set of pairs (string, string) I don't see a lot of differences
- 17:14:26 [fsasaki]
- .. we could provide a regex based on the 2nd element of the pair
- 17:15:05 [fsasaki]
- Addison: concern: regex are good but they are limiting, they don't understand what language tags are about
- 17:15:29 [fsasaki]
- sandro: Addison said using heavy semantic is overkill, but now you say regex is not enough
- 17:15:54 [fsasaki]
- Addison: regex is not enough, they could work, but the ones for language tags are a bit complicated
- 17:16:11 [aphillip_]
- static final String langtag_ex =
- 17:16:12 [aphillip_]
- "(\\A[xX]([\\x2d]\\p{Alnum}{1,8})*\\z)"
- 17:16:12 [aphillip_]
- + "|(((\\A\\p{Alpha}{2,8}(?=\\x2d|\\z)){1}"
- 17:16:12 [aphillip_]
- + "(([\\x2d]\\p{Alpha}{3})(?=\\x2d|\\z)){0,3}"
- 17:16:12 [aphillip_]
- + "([\\x2d]\\p{Alpha}{4}(?=\\x2d|\\z))?"
- 17:16:12 [aphillip_]
- + "([\\x2d](\\p{Alpha}{2}|\\d{3})(?=\\x2d|\\z))?"
- 17:16:14 [aphillip_]
- + "([\\x2d](\\d\\p{Alnum}{3}|\\p{Alnum}{5,8})(?=\\x2d|\\z))*)"
- 17:16:16 [aphillip_]
- + "(([\\x2d]([a-wyzA-WYZ](?=\\x2d))([\\x2d](\\p{Alnum}{2,8})+)*))*"
- 17:16:18 [aphillip_]
- + "([\\x2d][xX]([\\x2d]\\p{Alnum}{1,8})*)?)\\z";
- 17:17:27 [fsasaki]
- Addison: the structure of language tags with "-" makes it sometimes difficult to use regex
- 17:17:43 [sandro]
- q?
- 17:17:52 [fsasaki]
- boris: if we can agree on value space I don't see an issue
- 17:18:05 [AxelPolleres]
- addison, did I understand correctly that your "exceptions" from the typical subtag pattern (reg-expressions) could be more or less "read off" from http://www.iana.org/assignments/language-subtag-registry
- 17:18:06 [AxelPolleres]
- ?
- 17:18:28 [fsasaki]
- sandro: think that the pair is the right way to go
- 17:18:46 [fsasaki]
- axel: in the pair would "en" and "en-us" be disjoint?
- 17:18:51 [aphillip_]
- axel, exceptions are a very small list in the registry (or in rfc)
- 17:19:07 [fsasaki]
- boris: we are dealing with language of content here
- 17:19:24 [fsasaki]
- .. if you talk about values you have to distinguish "en" and "en-us"
- 17:19:55 [fsasaki]
- .. you could have a data type which is called lang-en which includes all values of "en", but that's a "class" thing
- 17:20:43 [fsasaki]
- boris: first question is whether we deal with one or two values, hierarchy is secondary question
- 17:21:33 [fsasaki]
- .. you can apply a function saying "give me all chinese tags"
- 17:21:50 [fsasaki]
- .. there is no need to put that into the semantics of the types, but have this in the built-in functions
- 17:22:11 [fsasaki]
- axel: do you deal in RIF with data types and / or facets?
- 17:22:19 [fsasaki]
- boris: what do you mean by facets here?
- 17:22:40 [fsasaki]
- axel: we use facets e.g. from XML Schema, to create facets for e.g. integer
- 17:22:46 [fsasaki]
- .. how do you do that in RIF?
- 17:22:59 [fsasaki]
- boris: in RIF so far we only define a basic set of data types
- 17:23:14 [fsasaki]
- .. we did not consider data type restrictions, facets at all yet
- 17:23:32 [fsasaki]
- axel: something to start working from: we could agree on one data type (both working group)
- 17:23:41 [fsasaki]
- .. e.g. "internationalized string"
- 17:23:47 [AxelPolleres]
- felix, it was the other way felix <-> axel
- 17:23:55 [fsasaki]
- .. value set is a set of pairs (string, string)
- 17:23:58 [AxelPolleres]
- s/felix/boris/
- 17:24:27 [fsasaki]
- s/axel: something/boris: something/
- 17:24:51 [fsasaki]
- Addison: question is how to deal with semantics of 2nd string
- 17:25:15 [AxelPolleres]
- s/boris: in RIF/axel: in RIF/
- 17:25:22 [fsasaki]
- .. XML Schema has a type "xs:language"
- 17:25:42 [fsasaki]
- .. that is a string, it can be used to represent xml:lang
- 17:25:55 [AxelPolleres]
- please paste the link
- 17:26:06 [AxelPolleres]
- (just for completeness)
- 17:26:13 [aphillip_]
- http://www.w3.org/TR/xmlschema11-2/#language
- 17:26:58 [fsasaki]
- boris: agree, would be better to refer to the language tag standard, RFC 3066
- 17:27:04 [fsasaki]
- addison: better BCP 47
- 17:27:18 [fsasaki]
- boris: afraid of referring to a moving target
- 17:27:56 [fsasaki]
- addison: we (i18n core) we have dealt with this elsewhere
- 17:27:59 [AxelPolleres]
- s/boris/axel/
- 17:28:09 [fsasaki]
- .. see e.g. XML Schema saying "RFC 3066 or its successor"
- 17:28:19 [fsasaki]
- sandro: sounds OK to me
- 17:28:35 [fsasaki]
- yyy: is an empty language tag valid?
- 17:28:54 [fsasaki]
- addison: it's not a valid language tag, but can be used e.g. in xm:lang
- 17:28:57 [AxelPolleres]
- s/yyy/boris/
- 17:29:09 [fsasaki]
- s/xm:lang/xml:lang/
- 17:29:38 [fsasaki]
- boris: Don't think that rdf:lang is effected here
- 17:29:56 [sandro]
- s/rdf:lang/rdf land/
- 17:30:22 [fsasaki]
- agreement
- 17:30:33 [sandro]
- boris: the only thing rdf needs is that (x, "") be distinct from all (x, *)
- 17:30:43 [sandro]
- (where * is not "")
- 17:30:53 [AxelPolleres]
- boris' proposal to include xs:string as a special case sounds fine to me.
- 17:31:54 [fsasaki]
- addison: can what you want be described as a standard form of language tag matching?
- 17:31:57 [fsasaki]
- boris: yes
- 17:32:20 [fsasaki]
- addison: BCP 47 has three algorithms for matching
- 17:32:26 [fsasaki]
- .. these include how to provide a list
- 17:32:37 [fsasaki]
- boris: as long as you can use some regex we could be fine
- 17:32:57 [aphillip_]
- http://www.ietf.org/rfc/rfc4647.txt : extended filtering
- 17:33:10 [fsasaki]
- addison: your approach is similar to extended filtering
- 17:33:11 [aphillip_]
- http://www.inter-locale.com/ID/draft-ietf-ltru-matching-15.html#extMatching
- 17:33:13 [AxelPolleres]
- boris, you mean e.g. that lang:en would be a subtype of (internationalized) string that covers all those with langtag en plus its subtags?
- 17:34:57 [fsasaki]
- boris: basic internationlized string data type would allow to implement something like that on top of this
- 17:35:21 [fsasaki]
- .. in OWL it may be not so easy since you are quantifying over ranges
- 17:35:31 [fsasaki]
- .. that might not be decideable , need to check expressibility
- 17:35:49 [fsasaki]
- addision: yes, and language tags provide ways to deal with that
- 17:36:08 [fsasaki]
- .. it's more complicated than other types like integer
- 17:36:39 [fsasaki]
- alex: confirm boris proposal: have one type "i18n string"
- 17:36:51 [fsasaki]
- .. if you want to have a type that covers all English strings, that would be a sub type?
- 17:36:53 [fsasaki]
- boris: yes
- 17:37:15 [fsasaki]
- .. that would be a sub set of the general value space
- 17:37:49 [fsasaki]
- sandro: we get the same functionality, no matter if we do one data type or one per language tag?
- 17:38:10 [fsasaki]
- .. the lexical spaces would be different
- 17:38:32 [fsasaki]
- boris: the value spaces are sub sets, that is most interesting
- 17:38:32 [AxelPolleres]
- basically, if i understand correctly, bortis says, we can do the single datatype on top, and specify the type hierarchy below it afterwars.
- 17:39:08 [fsasaki]
- alex: about boris' earlier proposal to include "string" in here:
- 17:39:27 [fsasaki]
- .. not sure how we could distinguish "string" from language tag "en" string
- 17:39:41 [fsasaki]
- boris: for OWL "i18n" string, to be able to embedd it into RDF
- 17:39:47 [fsasaki]
- .. we need a unique lexical representation
- 17:39:59 [fsasaki]
- .. proposal from OWL WG is:
- 17:40:08 [AxelPolleres]
- in (simple) RDF, btw "blabla" is different from "blabla"^^xs:string
- 17:40:10 [fsasaki]
- .. lexical space of i18n string: text of string, "@", language tag
- 17:40:33 [bmotik]
- String "abc" without langTag is "abc@"^^owl:internationalizedString
- 17:41:10 [bmotik]
- "abc"^^xsd:string
- 17:41:28 [bmotik]
- is equivalent to the previous one
- 17:41:29 [AxelPolleres]
- and we' define xs:string as a sybtype of internat.string which has exactly that valuespace? ... yest
- 17:42:01 [bmotik]
- "abc"@en
- 17:42:02 [fsasaki]
- boris: obviously the lexical representation is not equivalent
- 17:42:10 [bmotik]
- What is "abc"@en?
- 17:42:33 [fsasaki]
- boris: OWL WG says this is a syntactic shortcut for:
- 17:42:44 [bmotik]
- IT is a syntacitc shortcut for "abc@en"^^owl:internationalizedString
- 17:43:06 [AxelPolleres]
- Yup, we had talked about this shortcut in RIF, BTW, but not yet approved it.
- 17:43:08 [fsasaki]
- boris: this is to be compatible with RDF and the representation syntax
- 17:43:19 [bmotik]
- You define "abc" as a syntacitc shortcut for "abc@"^^owl:internationalizedString
- 17:44:24 [fsasaki]
- boris: internally you can say that all literals have this structure
- 17:45:47 [fsasaki]
- sandro: having the "@" sign in the string is kind of a hack
- 17:46:08 [fsasaki]
- .. it technically works but I'm worried that it is pretty ugly
- 17:46:25 [fsasaki]
- boris: that is the reason why we have the syntactic shortcut
- 17:46:49 [fsasaki]
- sandro: in the examples we use the shortcut, but the tools may or may not use the shortcut
- 17:46:50 [AxelPolleres]
- people aren't supposed to use it (just like people aren't supposed to use "a"^^rif:iri ... or no?)
- 17:46:56 [bmotik]
- "abc"^^lang:en
- 17:47:02 [fsasaki]
- boris: otherwise you would always have to write the above
- 17:47:12 [sandro]
- "chat"@en ==="chat"^^lang:en
- 17:47:38 [sandro]
- instead of "chat@en"^^owl:internationalizedString
- 17:48:07 [fsasaki]
- boris: do you agree that we still define the whole value space without the lexical space
- 17:48:21 [fsasaki]
- sandro: seems fine by me, sounds like something which will not be serialized
- 17:48:42 [fsasaki]
- boris: you could use owl:internationizedString
- 17:49:17 [fsasaki]
- .. we could call it "text"
- 17:49:21 [fsasaki]
- addison: better name
- 17:49:36 [fsasaki]
- .. using "internationalized" sounds that other strings are not internationalized
- 17:49:38 [AxelPolleres]
- text is simpler, agreement, it seems.
- 17:50:05 [AxelPolleres]
- s/text/"text"/
- 17:50:34 [fsasaki]
- Addison: better not to introduce an artifical distinction of strings if its not necessary
- 17:51:49 [fsasaki]
- sandro: I'm OK with the @ sign, the other proposal looked nicer
- 17:52:08 [fsasaki]
- axel: boris convinced me that we need the value space to treat all values differently
- 17:52:25 [fsasaki]
- .. we would not get that by type hierarchy with "lang"
- 17:53:02 [AxelPolleres]
- basically... from before: "basically, if i understand correctly, bortis says, we can do the single datatype on top, and specify the type hierarchy below it afterwars."
- 17:53:25 [AxelPolleres]
- ..." boris, you mean e.g. that lang:en would be a subtype of (internationalized) string that covers all those with langtag en plus its subtags?"
- 17:54:13 [AxelPolleres]
- lang:en *could* be defined by a reg exp.
- 17:54:49 [fsasaki]
- boris: using lang "en" in the lexical space might be confusing
- 17:55:47 [fsasaki]
- addison: yes, in the matching document RFC 4647 you have two names: the "sub tag" and the "range" which says "the thing a tag starts with"
- 17:56:35 [fsasaki]
- boris: lang data type is rather used for querying
- 17:56:50 [fsasaki]
- .. by not allowing a particular lexical representation we would make this clear
- 17:57:21 [fsasaki]
- addison: that gets you out of the problem that in language ranges you can have "*", but not in language tags
- 17:57:41 [AxelPolleres]
- The other extreme would be that "abc"^^lang:en is indeed the same as "abc"^^lang:en-us... just like "1.0"^^xs:decimal is indeed the same as "1"^^xs:integer ... thes seems to be not wanted, yes?
- 17:57:46 [fsasaki]
- boris: we could put some strong wording to the spec to make this clear
- 17:59:28 [fsasaki]
- boris: we need to specify what the allowed items are
- 17:59:29 [aphillip_]
- extended-language-range = (1*8ALPHA / "*")
- 17:59:29 [aphillip_]
- *("-" (1*8alphanum / "*"))
- 17:59:44 [fsasaki]
- addison: I would go to extended language range from RF 4647, see above
- 17:59:53 [fsasaki]
- .. reference those for matching
- 18:00:52 [RRSAgent]
- I have made the request to generate http://www.w3.org/2008/07/21-i18n-minutes.html aphillip_
- 18:01:25 [Zakim]
- -Felix
- 18:01:29 [sandro]
- scribe: Sandro
- 18:01:36 [sandro]
- scribenick: sandro
- 18:02:03 [sandro]
- Addison: If you just say it's a string, then you're going to not be helping people very much. There is infrastructure for this, and it would be good to reference that.
- 18:02:21 [AxelPolleres]
- What I mean is "lang:en" is then no longer a datatype, but a built-in function.
- 18:02:36 [sandro]
- Boris: Axel is saying that *matching* is a more appropriate operation on this data type -- a builting in RIF, a facet in OWL, that refer to RFC 4647.
- 18:04:09 [sandro]
- Boris: In OWL, these facets are relatively easy. For strings you have regexp pattern facets, restricting the string. We could easily introduce a language-range facet, which is a query in this RFC 4647 language. All pairs which match this query. RIF could have a similar built-in function.
- 18:04:41 [sandro]
- Axel: Our conclusion -- real datatype hierarchy is no practical. Given that, this all sounds fine.
- 18:05:24 [sandro]
- sandro: My inclination is rdf: as the prefix.
- 18:05:26 [AxelPolleres]
- rdf:text?
- 18:05:27 [AxelPolleres]
- +1
- 18:05:29 [sandro]
- boris: I don't care.
- 18:05:37 [baojie]
- I will summarize the recent emails and the discussion in an update document
- 18:05:37 [sandro]
- Addison: sounds good to me.
- 18:06:03 [AxelPolleres]
- boris said: I' fine, I don't care (slight difference)
- 18:06:07 [AxelPolleres]
- :-)
- 18:06:56 [baojie]
- http://www.w3.org/2007/OWL/wiki/InternationalizedString#
- 18:07:42 [baojie]
- http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec
- 18:08:01 [baojie]
- ok
- 18:08:34 [aphillip_]
- RRSAgent, make minutes
- 18:08:34 [RRSAgent]
- I have made the request to generate http://www.w3.org/2008/07/21-i18n-minutes.html aphillip_
- 18:09:01 [Zakim]
- -Addison_Phillips
- 18:09:02 [Zakim]
- -Sandro
- 18:09:02 [Zakim]
- -baojie
- 18:09:04 [Zakim]
- -bmotik
- 18:10:00 [Zakim]
- -AxelPolleres
- 18:10:01 [Zakim]
- SW_(ITST)1:00PM has ended
- 18:10:03 [Zakim]
- Attendees were AxelPolleres, Addison_Phillips, baojie, Sandro, Felix, bmotik
- 18:11:45 [aphillip_]
- aphillip_ has left #i18n
- 20:19:14 [Zakim]
- Zakim has left #i18n