See also: IRC log
<fsasaki> checking attendance
<fsasaki> scribe: daveL
<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0090.html
<fsasaki> http://www.doodle.com/pn6xa86rfbypmd2k
felix; there is no apparent slot that works. felix willl distribute a weekly alternating proposal
<fsasaki> scribe: fsasaki
dave: haven't updated the mapping
page a lot
... there is more work to be done to formalize the
mapping
... and come up with examples
... I think we won't to focus on XLIFF 1.2 mapping first
... we were hoping that XLIFF 2 would be stable, but there is a
delay
... focus on XLIFF 1.2 also helps with putting a demonstrator
together
yves: dave summarized everythign
right
... in okapi we implemented ITS mapping on what we have
... it is partially implemented, ongoing
dave: we will come back shortly
on that
... wrt to interop between solas and CMS lion, also using
okapi
... with the preparation for rome
phil: it is now on our critical
path for our implementation
... david said he would have a prototype a few weeks ago
... even if there is nothing final
... even if we would have a rough direction
... e.g. yves said that with xliff 1.2, he would use mrk
markup
... even if we had directions what is easily acceptable
... otherwise it could hold up my implemetnation
yves: the xliff 1.2 mapping is
what we used for implementations
... most of the time it made sense
... we have tackled some of the standoff stuff
... it is also in the git repository (for okapi, scribe
assumes)?
<Yves_> yes
phil: provenance and loc quality issue, rating are relevant for us here
<Yves_> Location: http://code.google.com/p/okapi/source/list?name=html5
phil: Yves' page for 1.2. we can certainly use that as our direction
dave: will talk to david tomorrow about that
phil: tx
<daveL> scribe: daveL
felix: asks if anyone has further thoughts, or supported for this new type
felix: no respeonses yet
shaun: no update on this
<fsasaki> ACTION: shaun to work on regex for validating regex subset proposal [recorded in http://www.w3.org/2013/01/16-mlw-lt-minutes.html#action02]
<trackbot> Created ACTION-385 - Work on regex for validating regex subset proposal [on Shaun McCance - due 2013-01-23].
felix: has been discussed in
response to christian comment
... any further comments
marcis: what is the goal?
felix: christian suggested
merging term and disambig data categories
... but response was that both had distinct use cases, that
could merge by are valid individually
marcis: would not want to drop
data category, term is easier to implement and purpose is
clear
... not so clear on disambiguation category, in terms of what
is possible to do with this
... for example there may be other types that might be useful
in the disambiguation use case
... and doing term management with disambig would make it very
heavy
... so there might need to be more atribute specifically for
named entity
... referencing input form W3C india recvied today
tadej: motivation for separate
data category was because it covered some use cases that fell
out of the scope of terminology
... by providing some additional context
... but do see that there is some commonality
... Also term must remain to keep compatibility with named
entity 1
correction, > with terminology in ITS1
jörg: still in favour of having the two data categories
scribe: since dismabiguation can
cover many other tasks in content or NLP processing
... whereas term is more specific
pedro: the sort of text we mark up is different in both cases so it makes sense to keep the distinction
tadej; agree granularities are quite limiting, or should we have more identifiers to support this
scribe: but this might be more comlicating
jorge: yes this would be more complicated, clearer as it is
<fsasaki> http://tinyurl.com/its20-testsuite-dashboard
felix: christian will dial in to
f2f to discuss this and resolve the topic next week
... we also need to consider number of implementations, which
are not so many, when considering any possible merger
Des: agree with jorge, keep them separate as they are distinct use cases
jorge: clarified, attributes as defined currently are clearer than making them more fine grained
felix: reminds that W3C process requires responding which involves some work
<Yves_> could we talk about annotorsRef https://www.w3.org/International/multilingualweb/lt/track/issues/71 a bit during this call?
felix: replying to a question from Dave: the current number of comments received is good
yves: for two data categories,
proc and locqualiss, can have information from multiple
annotators, but we have no way of doing this for
annotatorRef
... for current implementation, we assume the most recent
annotator is the correct one, but this is not ideal
... provenance especially has multiple items and requires
annotationRef
<fsasaki> daveL: will look into this thread
<scribe> scribe: daveL
phil: lets talk about the ordering of proveance
<Yves_> provenance data category https://www.w3.org/International/multilingualweb/lt/track/issues/72
<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0090.html
<Arle_> I am back on the call.
<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0061.html
<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0066.html
felix: this was a discussion of whether there was any implication between ordering and time of record
<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0055.html
<fsasaki> (mails related to the discussion)
phil: asks whether there should be a lack of date stamp
<fsasaki> daveL: a date stamp was discussed
<fsasaki> .. there is two aspects:
<fsasaki> .. a lot of original requirements didn't have a strong need for a time stamp
<fsasaki> .. the original requirement was about identifying rich enough so that we can differentiate
<fsasaki> .. see e.g. "agent provenance" that used to include taht
<fsasaki> .. the 2nd aspect:
<fsasaki> .. we discussed whether the order of the proveancen records are added is significant
<fsasaki> .. but from an implementation point of view it is again compliciated
<fsasaki> .. and there hadn't be much a call for this during requirements gathering
<fsasaki> .. "time" also has various aspects: start of a translation, finish, duration, ...
<fsasaki> .. it is also a point that the provenance wg in w3c had addressed
<fsasaki> .. so we just provide identifiers of who made the translation and revision
<fsasaki> .. for knowing more there is a the provenance model
<fsasaki> .. more = more about time
<fsasaki> .. so in summary, there was no big requirement to have a time stamp
<fsasaki> .. and *if* you want to do that, you can use the w3c prov model
<fsasaki> .. I'll reply to that mail thread
<fsasaki> pablo: I think provenance can stay as is
<fsasaki> .. adding a time stamp can be useful and interesint - if every implementer is fine with that i'm fine too
<scribe> scribe: daveL
felix: adding tiestamp is a substantive change and would require another call, plus tests etc
<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0090.html
felix: from this week on be aware that people should stop using the google docs and they update the test suite master themselves
<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Dec/0087.html
felix: we need still some input on tests still related to assertion (MUSTs0 which need suggestion for test for them
<fsasaki> http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f
<fsasaki> http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Objectives
felix: thanks to jirka for organising this
<fsasaki> http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Participants
jirka: is you are not yet register, please do so asap. Numbers of people need to be known for wifi etc.
felix: also need to know in advance when people want to dial in for organising the agenda
<fsasaki> http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Objectives
felix: going through objectives
<fsasaki> http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary
felix: in particular the relationship between the different posters and links to where people can access them and update high level summary, adding any new use cases
<fsasaki> daveL: some time to discuss preparing EU project review?
felix: also brainstorm on
activities for rest of year and new projects and synergy
between them
... the Rome preparation should cover that.
<fsasaki> scribe: fsasaki
<omstefanov> as I will not be able to take part in the f2f Prague, but definitely intend to come to Rome, so please make sure preps for Rome are recording in writing
david: phil asked on that, we got
good comments from xyz
... status of xliff mapping - only written piece is xliff
mapping wiki
<dF> http://www.w3.org/International/multilingualweb/lt/wiki/XLIFF_Mapping
david: will work on this today,
yesterday / today was EC deadline
... we should publish this as a note / PC
... what is the editorial setup for such a note?
... we will need an additional namespace itsx
felix: update on implementation prototype?
david: solas is consuming ITS2
categories
... like OKAPI does
... that is being tested as part of the test suite
... that is consumed by various components of solas
architecture
... one is an MT broker
... works with different MT systems
... depends on the MT systems whether they can deal with ITS
metadata
... moravia is contributing to that
... m4loc can be used as middleware
... in our current prototype the mt services exposes the m4loc
service
... from the deliverable - open source xliff roundtripp
... the okapi filter interprets the ITS decoration
... then the mapping in the wiki is used
... it is consumed by middle ware open source component
felix: would be good to see a demo
david: will do, in prague and in rome
ankit: we are waiting for some sort of data from cocomore
felix: what data?
ankit: we said that cocomore would provide us with annotated data
ankit will provide module by prague f2f
pedro: will have annotated data
from spanish client
... client is the spanish gov tax office
... they will annotate with ITS metadata for this show
case
... spanish content in HTML5
... we will generate english content
... and annotate it in the output of the real time system
felix: so ankit could later use the data to test the module?
ankit: training data is as much as you can get
pedro: annotated data from
cocomore is html content
... we will generate content in chinese and french
... so ankit can take that into account chinese, french, german
in his system
... and spanish
... this will be german to english, german to french, german to
chinese, german to spanish
<Pedro> Showcase WP3 (Cocomore-Linguaserve) is German to Chinese and German to French
<Clemens> right!
<Pedro> Showcase WP4 (Linguaserve-Lucy-DCU) is the full demo Spanish to English, and partial demo Spanish to French and Spanish to German
thanks for everybody for staying longer, meeting adjourned