See also: IRC log
<aphillip_> Axel, could you send a message to everyone saying that we'll use this channel? I have to disconnect my email to use IRC :-(
<AxelPolleres> yup
<aphillip_> hi, are we public or member-only here?
<sandro> public on a # channel
<fsasaki> public
<aphillip_> i18n is public... I'm actually asking if we should make minutes public
<sandro> Ah.
<sandro> public.
<bmotik> What is the conference code?
<aphillip_> 4878
<AxelPolleres> Jie had collected a list of issues... (which I extend a bit)
<AxelPolleres> Open issues for further discussion include:
<AxelPolleres> * The choice of name space. Alternatives include "rif", "owl", "rdf" or "xsd". Note that the RIF Working Group [7] did not put "rif:text" into the xsd (XML Schema) namespace becuase such a datatype is not considered primitive.
<AxelPolleres> * The construct's name, e.g., "text" or "internationalizedString".
<AxelPolleres> * In language tag pattern matching, whether allow case insensitive matching [8].
<AxelPolleres> * Whether supersede RFC 3066 with RFC 4646 (Tags for Identifying Languages)
<AxelPolleres> * Shall we do an own datatype hierarchy?
<AxelPolleres> * Should the subtag hierarchy have semantic implications?
<baojie> Axel: could you paste link your extended issue list here?
<AxelPolleres> I didn't put it online yet, took your list as a basis:
<AxelPolleres> http://www.w3.org/2007/OWL/wiki/InternationalizedString#The_Proposal_owl:langPattern_.28OWL_Working_Group.29
<fsasaki> scribe: Felix
<fsasaki> scribeNick: fsasaki
<scribe> meeting: OWL / i18n meeting
alex: two proposals:
... how to deal with internationalized text
... one proposal to have one data type, or a hierarchy of data
types
... this has different implications depending on where we
go
... sub typing would have semantic implication
... but not sure if the semantic implication is really
wanted
... would "en" vs "en-us" mean that if we query for "en" we
also get "en-us"?
... if we would have a type hierarchy we would get that
... even if we do that we need to make clear: how to define the
value spaces and lexical spaces
... if we have just one data type, as a lexical space we have
strings with the "@" sign in the language tag
... with one data type, we would have pairs of language tags
and string parts
... another question whether we go for RFC 4646 or RFC 3066
(which seems to be obsolete)
... opinions on what I said?
... I think data type hierarchy is feasible, but not sure if it
can be done easily
... not sure how we could have semantic implication of the data
type
<AxelPolleres> who is speaking???
addison: language tag structure
is important for operations like matching
... most W3C technologies are not designed for dealing with
structured strings
... if I ask for let's say an i18n string in English, how do I
get all English "en", "en-us" etc. with one request
... if you construct the hierarchy of tags, as with sub
tags
... there is a lot of machinery for a straight forward kind of
thing
<AxelPolleres> Also the sub-tags are not the *only* about sub-strings, right?
addison: matching algorithm is very simple string matching
boris: axel said if we don't go
for the data type hierarchy we can't have implications
... I'm not sure about that
... currently OWL says we need a value space consisting of
pairs
... I think even if we have a hierarchy these strings need to
be different
... if we agree that the value space is a set of pairs (string,
string) I don't see a lot of differences
... we could provide a regex based on the 2nd element of the
pair
Addison: concern: regex are good but they are limiting, they don't understand what language tags are about
sandro: Addison said using heavy semantic is overkill, but now you say regex is not enough
Addison: regex is not enough, they could work, but the ones for language tags are a bit complicated
<aphillip_> static final String langtag_ex =
<aphillip_> "(\\A[xX]([\\x2d]\\p{Alnum}{1,8})*\\z)"
<aphillip_> + "|(((\\A\\p{Alpha}{2,8}(?=\\x2d|\\z)){1}"
<aphillip_> + "(([\\x2d]\\p{Alpha}{3})(?=\\x2d|\\z)){0,3}"
<aphillip_> + "([\\x2d]\\p{Alpha}{4}(?=\\x2d|\\z))?"
<aphillip_> + "([\\x2d](\\p{Alpha}{2}|\\d{3})(?=\\x2d|\\z))?"
<aphillip_> + "([\\x2d](\\d\\p{Alnum}{3}|\\p{Alnum}{5,8})(?=\\x2d|\\z))*)"
<aphillip_> + "(([\\x2d]([a-wyzA-WYZ](?=\\x2d))([\\x2d](\\p{Alnum}{2,8})+)*))*"
<aphillip_> + "([\\x2d][xX]([\\x2d]\\p{Alnum}{1,8})*)?)\\z";
Addison: the structure of language tags with "-" makes it sometimes difficult to use regex
boris: if we can agree on value space I don't see an issue
<AxelPolleres> addison, did I understand correctly that your "exceptions" from the typical subtag pattern (reg-expressions) could be more or less "read off" from http://www.iana.org/assignments/language-subtag-registry
<AxelPolleres> ?
sandro: think that the pair is the right way to go
axel: in the pair would "en" and "en-us" be disjoint?
<aphillip_> axel, exceptions are a very small list in the registry (or in rfc)
boris: we are dealing with
language of content here
... if you talk about values you have to distinguish "en" and
"en-us"
... you could have a data type which is called lang-en which
includes all values of "en", but that's a "class" thing
... first question is whether we deal with one or two values,
hierarchy is secondary question
... you can apply a function saying "give me all chinese
tags"
... there is no need to put that into the semantics of the
types, but have this in the built-in functions
axel: do you deal in RIF with data types and / or facets?
boris: what do you mean by facets here?
axel: we use facets e.g. from XML
Schema, to create facets for e.g. integer
... how do you do that in RIF?
... in RIF so far we only define a basic set of data
types
... we did not consider data type restrictions, facets at all
yet
boris: something to start working
from: we could agree on one data type (both working
group)
... e.g. "internationalized string"
<AxelPolleres> felix, it was the other way boris <-> axel
boris: value set is a set of pairs (string, string)
Addison: question is how to deal
with semantics of 2nd string
... XML Schema has a type "xs:language"
... that is a string, it can be used to represent xml:lang
<AxelPolleres> please paste the link
<AxelPolleres> (just for completeness)
<aphillip_> http://www.w3.org/TR/xmlschema11-2/#language
boris: agree, would be better to refer to the language tag standard, RFC 3066
addison: better BCP 47
axel: afraid of referring to a moving target
addison: we (i18n core) we have
dealt with this elsewhere
... see e.g. XML Schema saying "RFC 3066 or its successor"
sandro: sounds OK to me
boris: is an empty language tag valid?
addison: it's not a valid language tag, but can be used e.g. in xml:lang
boris: Don't think that rdf land is effected here
agreement
<sandro> boris: the only thing rdf needs is that (x, "") be distinct from all (x, *)
<sandro> (where * is not "")
<AxelPolleres> boris' proposal to include xs:string as a special case sounds fine to me.
addison: can what you want be described as a standard form of language tag matching?
boris: yes
addison: BCP 47 has three
algorithms for matching
... these include how to provide a list
boris: as long as you can use some regex we could be fine
<aphillip_> http://www.ietf.org/rfc/rfc4647.txt : extended filtering
addison: your approach is similar to extended filtering
<aphillip_> http://www.inter-locale.com/ID/draft-ietf-ltru-matching-15.html#extMatching
<AxelPolleres> boris, you mean e.g. that lang:en would be a subtype of (internationalized) string that covers all those with langtag en plus its subtags?
boris: basic internationlized
string data type would allow to implement something like that
on top of this
... in OWL it may be not so easy since you are quantifying over
ranges
... that might not be decideable , need to check
expressibility
addision: yes, and language tags
provide ways to deal with that
... it's more complicated than other types like integer
alex: confirm boris proposal:
have one type "i18n string"
... if you want to have a type that covers all English strings,
that would be a sub type?
boris: yes
... that would be a sub set of the general value space
sandro: we get the same
functionality, no matter if we do one data type or one per
language tag?
... the lexical spaces would be different
boris: the value spaces are sub sets, that is most interesting
<AxelPolleres> basically, if i understand correctly, bortis says, we can do the single datatype on top, and specify the type hierarchy below it afterwars.
alex: about boris' earlier
proposal to include "string" in here:
... not sure how we could distinguish "string" from language
tag "en" string
boris: for OWL "i18n" string, to
be able to embedd it into RDF
... we need a unique lexical representation
... proposal from OWL WG is:
<AxelPolleres> in (simple) RDF, btw "blabla" is different from "blabla"^^xs:string
boris: lexical space of i18n string: text of string, "@", language tag
<bmotik> String "abc" without langTag is "abc@"^^owl:internationalizedString
<bmotik> "abc"^^xsd:string
<bmotik> is equivalent to the previous one
<AxelPolleres> and we' define xs:string as a sybtype of internat.string which has exactly that valuespace? ... yest
<bmotik> "abc"@en
boris: obviously the lexical representation is not equivalent
<bmotik> What is "abc"@en?
boris: OWL WG says this is a syntactic shortcut for:
<bmotik> IT is a syntacitc shortcut for "abc@en"^^owl:internationalizedString
<AxelPolleres> Yup, we had talked about this shortcut in RIF, BTW, but not yet approved it.
boris: this is to be compatible with RDF and the representation syntax
<bmotik> You define "abc" as a syntacitc shortcut for "abc@"^^owl:internationalizedString
boris: internally you can say that all literals have this structure
sandro: having the "@" sign in
the string is kind of a hack
... it technically works but I'm worried that it is pretty
ugly
boris: that is the reason why we have the syntactic shortcut
sandro: in the examples we use the shortcut, but the tools may or may not use the shortcut
<AxelPolleres> people aren't supposed to use it (just like people aren't supposed to use "a"^^rif:iri ... or no?)
<bmotik> "abc"^^lang:en
boris: otherwise you would always have to write the above
<sandro> "chat"@en ==="chat"^^lang:en
<sandro> instead of "chat@en"^^owl:internationalizedString
boris: do you agree that we still define the whole value space without the lexical space
sandro: seems fine by me, sounds like something which will not be serialized
boris: you could use
owl:internationizedString
... we could call it "text"
addison: better name
... using "internationalized" sounds that other strings are not
internationalized
<AxelPolleres> "text" is simpler, agreement, it seems.
Addison: better not to introduce an artifical distinction of strings if its not necessary
sandro: I'm OK with the @ sign, the other proposal looked nicer
axel: boris convinced me that we
need the value space to treat all values differently
... we would not get that by type hierarchy with "lang"
<AxelPolleres> basically... from before: "basically, if i understand correctly, bortis says, we can do the single datatype on top, and specify the type hierarchy below it afterwars."
<AxelPolleres> ..." boris, you mean e.g. that lang:en would be a subtype of (internationalized) string that covers all those with langtag en plus its subtags?"
<AxelPolleres> lang:en *could* be defined by a reg exp.
boris: using lang "en" in the lexical space might be confusing
addison: yes, in the matching document RFC 4647 you have two names: the "sub tag" and the "range" which says "the thing a tag starts with"
boris: lang data type is rather
used for querying
... by not allowing a particular lexical representation we
would make this clear
addison: that gets you out of the problem that in language ranges you can have "*", but not in language tags
<AxelPolleres> The other extreme would be that "abc"^^lang:en is indeed the same as "abc"^^lang:en-us... just like "1.0"^^xs:decimal is indeed the same as "1"^^xs:integer ... thes seems to be not wanted, yes?
boris: we could put some strong
wording to the spec to make this clear
... we need to specify what the allowed items are
<aphillip_> extended-language-range = (1*8ALPHA / "*")
<aphillip_> *("-" (1*8alphanum / "*"))
addison: I would go to extended
language range from RF 4647, see above
... reference those for matching
<sandro> scribe: Sandro
<scribe> scribenick: sandro
Addison: If you just say it's a string, then you're going to not be helping people very much. There is infrastructure for this, and it would be good to reference that.
<AxelPolleres> What I mean is "lang:en" is then no longer a datatype, but a built-in function.
Boris: Axel is saying that
*matching* is a more appropriate operation on this data type --
a builting in RIF, a facet in OWL, that refer to RFC
4647.
... In OWL, these facets are relatively easy. For strings you
have regexp pattern facets, restricting the string. We could
easily introduce a language-range facet, which is a query in
this RFC 4647 language. All pairs which match this query. RIF
could have a similar built-in function.
Axel: Our conclusion -- real datatype hierarchy is no practical. Given that, this all sounds fine.
sandro: My inclination is rdf: as the prefix.
<AxelPolleres> rdf:text?
<AxelPolleres> +1
boris: I don't care.
<baojie> I will summarize the recent emails and the discussion in an update document
Addison: sounds good to me.
<AxelPolleres> boris said: I' fine, I don't care (slight difference)
<AxelPolleres> :-)
<baojie> http://www.w3.org/2007/OWL/wiki/InternationalizedString#
<baojie> http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec
<baojie> ok
This is scribe.perl Revision: 1.133 of Date: 2008/01/18 18:48:51 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/xxx/boris/ Succeeded: s/alex/axel/ Succeeded: s/felix/boris/ Succeeded: s/axel: something/boris: something/ Succeeded: s/boris: in RIF/axel: in RIF/ Succeeded: s/boris/axel/ Succeeded: s/yyy/boris/ Succeeded: s/xm:lang/xml:lang/ Succeeded: s/rdf:lang/rdf land/ Succeeded: s/text/"text"/ Found Scribe: Felix Found ScribeNick: fsasaki Found Scribe: Sandro Inferring ScribeNick: sandro Found ScribeNick: sandro Scribes: Felix, Sandro ScribeNicks: fsasaki, sandro WARNING: No "Topic:" lines found. WARNING: No "Present: ... " found! Possibly Present: Addison Addison_Phillips Axel AxelPolleres Boris Felix P0 P3 P8 Sandro addision alex aphillip_ baojie bmotik fsasaki lang rdf scribenick You can indicate people for the Present list like this: <dbooth> Present: dbooth jonathan mary <dbooth> Present+ amy WARNING: No meeting chair found! You should specify the meeting chair like this: <dbooth> Chair: dbooth Got date from IRC log name: 21 Jul 2008 Guessing minutes URL: http://www.w3.org/2008/07/21-i18n-minutes.html People with action items: WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report[End of scribe.perl diagnostic output]