IRC log of mlwDub on 2012-06-13

Timestamps are in UTC.

08:08:34 [RRSAgent]
RRSAgent has joined #mlwDub
08:08:34 [RRSAgent]
logging to
08:08:41 [Zakim]
Zakim has joined #mlwDub
08:08:47 [fsasaki]
meeting: MLW workshop
08:08:50 [fsasaki]
chair: arle
08:08:55 [fsasaki]
08:09:02 [fsasaki]
present: again, many, people
08:09:03 [tadej]
tadej has joined #mlwDub
08:09:20 [leroy]
leroy has joined #mlwDub
08:09:31 [Yves_]
Yves_ has joined #mlwDub
08:09:35 [mhellwig]
mhellwig has joined #mlwdub
08:10:11 [daveL]
daveL has joined #mlwDub
08:10:15 [Jirka]
Jirka has joined #mlwDub
08:10:26 [Jirka]
scribe: Jirka
08:10:59 [Jirka]
Change to agenda - Felix will go trough data categories implementation commitments list first
08:11:37 [Milan]
Milan has joined #mlwdub
08:12:54 [Jirka]
topic: Implementation commitments
08:13:04 [Jirka]
leaded by Felix
08:13:43 [gderiard]
gderiard has joined #mlwDub
08:14:21 [Jirka]
Felix: review of agenda
08:14:40 [dgroves]
dgroves has joined #mlwdub
08:15:31 [Arle]
Key is *real* commitments, not just interest.
08:16:49 [Jirka]
Dave: where we will collect implementation commitments
08:16:51 [Arle]
Decisions (not final details) must be complete by July, with details by November.
08:17:06 [Jirka]
Felix: proposes to create wiki for such data
08:17:54 [Des]
Des has joined #mlwdub
08:18:40 [Jirka]
... everyone should think about which categories can gain support in implementations, we will discuss it afternoon
08:19:00 [Jirka]
... what we will agree to today will appear in draft, we can change later if necessary
08:20:06 [Arle]
08:20:24 [dF]
dF has joined #mlwdub
08:20:51 [Jirka]
???: There is overlap in NIF and ITS proposed datacategories
08:21:09 [Arle]
s/???/Sebastian Hellman/
08:21:13 [thomas_]
thomas_ has joined #mlwDub
08:22:29 [Jirka]
... RDF world can reuse ITS concepts if we provide ITS OWL
08:23:30 [omstefanov]
omstefanov has joined #mlwDub
08:24:46 [Jirka]
Tadej: RDF/RDFa can be used as an interchange format only, it will not be generally possible to construct original HTML+ITS fron NIF
08:25:12 [Jirka]
ACTION: Tadej to Write proposal how mapping between NIF and HTML+ITS would look like with concrete examples
08:27:34 [Jirka]
topic: XLIFF extensibility
08:27:51 [Jirka]
Des: are talking about extensibility in XLIFF in general only for ITS
08:28:13 [Jirka]
Felix: In general
08:29:03 [Jirka]
David: Individual people can send comments to XLIFF TC asking for improved extensibility
08:29:05 [thomas]
thomas has joined #mlwDub
08:29:11 [Pedro]
Pedro has joined #mlwDub
08:30:23 [Jirka]
Bryan: Message could be that we want custom namespaces feature to be improved
08:30:51 [Jirka]
Richard: We should ask personally who will support this
08:31:13 [micha]
micha has joined #mlwdub
08:31:31 [Jirka]
ACTION: Felix to Draft email to XLIFF committee about improving extensibility [due 2012-06-15]
08:31:45 [fsasaki]
will do that by today evening
08:31:55 [fsasaki]
to be sent to public-multilingualweb-lt
08:32:28 [Jirka]
rrsagent, draft minutes
08:32:28 [RRSAgent]
I have made the request to generate Jirka
08:33:02 [Jirka]
Topic: LocWorld
08:33:48 [Jirka]
David: October 16th - our ITS track
08:34:08 [Jirka]
... Oct 17th pre-conference track for broader auditorium
08:34:36 [Jirka]
... LocWorld is 18 and 19, we will have few talks about ITS there
08:35:14 [Jirka]
Felix: We need to find September dates for additional technical meeting
08:36:21 [Jirka]
... proposal Sep 17/18 in Prague
08:37:36 [Jirka]
... and alternative is 25/26
08:39:00 [Jirka]
... final decision is September 25/26
08:39:17 [Jirka]
ACTION: Jirka to arrange F2F meeting in September at UEP
08:40:52 [Jirka]
topic: Project Information Metadata
08:40:56 [Jirka]
lead by David Filip
08:41:29 [fsasaki]
(wrt to last session - people interested in XLIFF email draft: Richard, Des, Felix, who else?)
08:43:04 [Jirka]
David: summarizes current PI related proposed data categories
08:43:43 [Jirka]
... too many and overlapping data categories
08:43:46 [fsasaki]
all, my list of consensus is now at - please have a look and come back to it in the afternoon
08:49:10 [Jirka]
Felix: we can create best practise saying to reuse existing Dublin Core properties
08:51:22 [Jirka]
Arle: There is also big overlap with ISO 10669
08:52:27 [fsasaki]
idea is to reuse HTML "meta" element and have dc.subject in here with a scheme, e.g. ISO 10669, DDC (if you want), ...
08:52:40 [fsasaki]
s/idea/my idea/
08:52:48 [RRSAgent]
I have made the request to generate fsasaki
08:53:34 [Jirka]
Maxime: meta applies to document as a whole but you might want several chunks inside document
08:54:06 [Jirka]
Arle: idea was to have ability to apply those data categories to any part of document
08:55:03 [Jirka]
Dave: we should incorporate to ITS only things which help in translation and have use case, we shouldn't supply general CMS related metadata
09:00:26 [Jirka]
ACTION: Felix to Summarize discussion around Domain
09:00:37 [Jirka]
David: Genre discussion
09:01:16 [fsasaki]
mail from Georg here
09:01:29 [PaulMacAree]
PaulMacAree has joined #mlwDub
09:01:55 [Jirka]
... consensus about dropping genre
09:02:09 [Jirka]
... formatType discussion
09:05:16 [fsasaki]
dublin core examples see (in German, apologies)
09:08:10 [RRSAgent]
I have made the request to generate fsasaki
09:08:26 [Jirka]
... consensus for dropping translationQualification
09:15:57 [Jirka]
... back to domain
09:16:27 [Jirka]
Yves: shouldn't we provide ITS category for domain and for HTML map this to DC in meta
09:20:21 [Jirka]
There seems to be interested in implementing domain
09:20:50 [fsasaki]
I think Pedro, Declan, Yves raised their hands - please protest otherwise
09:21:22 [Jirka]
... register discussion
09:34:03 [fsasaki]
FYI, here is the domain mapping rule example
09:35:25 [Jirka]
... genre and purpose drop
09:35:44 [Jirka]
09:36:02 [Jirka]
rrsagent, draft minutes
09:36:02 [RRSAgent]
I have made the request to generate Jirka
10:03:05 [mhellwig]
scribe: mhellwig
10:03:34 [mhellwig]
topic: Translation Process Metadata
10:05:44 [gderiard]
gderiard has joined #mlwDub
10:05:46 [RRSAgent]
I have made the request to generate fsasaki
10:06:26 [omstefanov]
omstefanov has left #mlwDub
10:06:43 [mhellwig]
dF: the agenda is, what is related to process metadata. Everything we define is orthogonal, all data categories are orthonogal, but they need to be in sync
10:06:44 [omstefanov]
omstefanov has joined #mlwDub
10:07:10 [micha]
micha has joined #mlwdub
10:07:31 [mhellwig]
... orthogonal categories must fit together
10:08:45 [mhellwig]
... the state machines need to be in think, orthogonal values must make sense to parties in the chain
10:10:36 [mhellwig]
... particularly orthogonal categories is provenance
10:11:17 [Zakim]
Zakim has left #mlwDub
10:12:00 [mhellwig]
... no single tranlsation process. various different requirements, so categories must sync on the fly.
10:13:03 [mhellwig]
... working on a category integration platform - a test bed - to simulate lifecycle designs
10:13:33 [mhellwig]
... Pedro will talk about the process metadata in the requirements document
10:14:36 [omstefanov_]
omstefanov_ has joined #mlwdub
10:14:48 [mhellwig]
... provenance metadata categories should be connected to process metadata
10:15:43 [mhellwig]
fsasaki: there are expectations for state machines - what states are aloud and so on. How far are you on this?
10:16:15 [mhellwig]
dF: discussed yesterday, CNGL is interested and I think the work can be interesting for the work. Particularly the testing setup.
10:16:54 [mhellwig]
... use of general process ?? labels most important results.
10:17:40 [mhellwig]
... Des talked about process boundaries, we should be aware of them.
10:18:46 [mhellwig]
Des: on the state machine question, I wouldn't like to enforce state machine in metadata. It should be purely informative.
10:20:45 [mhellwig]
Pedro: it's true that it's quite complex, because we cannot constrain the things people want to do in their workflows
10:21:05 [mhellwig]
... [talking about process metadata] there are three data categories
10:21:29 [mhellwig]
... readiness, progress indicator, localisationCache
10:22:03 [mhellwig]
... last category are a special case, because here LSPs are not only interacting but publishing to the Web.
10:22:32 [dF]
dF has joined #mlwdub
10:22:44 [mhellwig]
... readiness should indicate readiness for a particular proceess, it's priority and the expectation when it's supposed to be completed and whether an element may have already been committted
10:23:18 [mhellwig]
... currently target language not part of the metadata
10:24:01 [mhellwig]
... contentType (e.g. MIME, ...); pivotLang, if you go through an intermediate language or to save costs when e.g. translating from Portuguese to Brasilian
10:24:12 [mhellwig]
... it reduces costs because you only have to revise
10:24:30 [Arle]
Note: remember slides are available at
10:24:30 [mhellwig]
... contentResultsSource, whether the original content has to be returned
10:24:48 [Marion_Shaw]
Marion_Shaw has joined #mlwDub
10:24:56 [mhellwig]
... contentResultTarget, should all languages be sent in one file or as separate files per language
10:25:32 [mhellwig]
... the most important part is to specify the process without constraining it
10:25:53 [mhellwig]
... process data model 1, define phases and in each phase different processes
10:26:56 [mhellwig]
... process data model, a list of processes created from norm ??
10:27:25 [mhellwig]
... this list could be published and maintained so it wouldn't need to be a closed list
10:27:38 [mhellwig]
... so the list can evolve
10:28:11 [mhellwig]
... three scenarios of application: consumes (c), generates (g), transforms (t) content
10:28:51 [mhellwig]
... progress indicator to return information to the cms how much has been completed
10:29:15 [mhellwig]
... for example, the TMS can return that a process has been 40% completed, but 60% are still to be done
10:29:28 [mhellwig]
... finally localisationCache. A lot of discussion about this
10:30:10 [mhellwig]
... it's about real-time translation. the LSP directly publishes and we want the client to be able to specify whether content should be cached
10:30:21 [mhellwig]
... this is an evolution of LSP
10:31:39 [mhellwig]
Des: localisationCache is very good idea, but when do you know when the cache has been invalidated? When the source changes
10:32:05 [mhellwig]
Pedro: maybe you're right and we're missing a not-valid attribute
10:32:26 [mhellwig]
Des: there must a trace back to the source. and if that source changes your cache is out of date and invalid
10:33:48 [mhellwig]
action: Des to write example for how to deal with invalid cache (re: localisationCache)
10:34:46 [mhellwig]
Des: The progress indicator, very useful and valid, but this is an ideal candidate for a service boundary. Good candiate for a standard API
10:35:00 [mhellwig]
Pedro: progress-indicator better in the API
10:35:49 [mhellwig]
dF: agrees, it's an API function, not about the content
10:36:11 [mhellwig]
... the API is a by-product of the implementation
10:37:34 [mhellwig]
Des: there is a case for it to be in metadata. As a page changes, the system active on that data can update the metadata
10:38:08 [mhellwig]
Arle: where you have blind operation (crowd-sourcing) so you may not have an API, it could prove helpful there
10:39:18 [mhellwig]
dF: progress indicator a project attribute, not a content attribute
10:40:32 [mhellwig]
Dag: we have large XML files with a lot of data. a translator might not complete that in a day, you may want to have information on the progress. There is a case for having progress-indicator
10:41:10 [mhellwig]
fsasaki: question to XLIFF TC members, how is that with XLIFF
10:41:22 [mhellwig]
BryanSchnabel: there are state attributes
10:41:28 [mhellwig]
fsasaki: so there is a solution already
10:42:27 [mhellwig]
KistenSteffn: it would be a good indicator for technical projects in general.
10:42:51 [mhellwig]
Pedro: I think progress indicator is good and useful, but the question is who will be implement it
10:43:38 [mhellwig]
fsasaki: do you want to focus on content metadata related metadata, but if you see something useful coming out of the implementation, it#s good to publish that information
10:44:14 [mhellwig]
fsasaki: is it easier to put it in the requirements or figure it out from the implementation? You should figure that out
10:44:50 [mhellwig]
DaveL: process name information should be informative
10:46:08 [mhellwig]
Pedro: process data model 1 looks good, but it makes some a priori assumptions
10:46:25 [mhellwig]
... process data model 2 does not make such assumptions
10:47:40 [mhellwig]
AlexLik: it's worthwhile watch the terminology
10:48:19 [mhellwig]
Pedro: is XLIFF an implementation of ITS 2.0
10:48:29 [mhellwig]
fsasaki: if there is an existing solution use that.
10:48:50 [mhellwig]
Pedro: if ITS can be implemented in various formats doesn't avoid that we have it in ITS 2.0
10:49:05 [mhellwig]
fsasaki: should refer to existing metadata and say we can use that metadata also in other file types
10:49:32 [mhellwig]
... but the definition wouldn't be ours. Else we would have these clashes; somebody uses XLIFF attribute another ITS
10:50:57 [mhellwig]
Pedro: so what do we do with this data category
10:51:16 [mhellwig]
DaveL: the discussion is on how do we address the informative aspect
10:51:55 [mhellwig]
DaveL: should not spend a lot of time on this here if it's not normative
10:52:31 [mhellwig]
fsasaki: wiki page has been updated, there are currently 14 or 15 items with question marks
10:52:52 [mhellwig]
... need to decide now, what do we put in the normative part
10:53:34 [mhellwig]
... and then there are other parts like readiness and we would like to see more how it#s coming out of the implementation. Then we can include it in the informative part
10:54:05 [mhellwig]
Dag: what is different on these data categories from what's in XLIFF?
10:54:54 [mhellwig]
dF: state attribute has a number of various values, but these are only values necessary for lifecycle. You can't just use them, you need to complete the cycle
10:55:12 [mhellwig]
Des: agrees, there is a case for readiness within the CMS as well
10:55:42 [mhellwig]
... readiness is specific to workflows, so readiness is relevant only on that workflow
10:56:21 [mhellwig]
Pedro: we have this in our API, but we have to find out whether it#s also valid for other format, like HTML
10:56:34 [mhellwig]
... probably can't resolve in 5 minutes
10:56:44 [mhellwig]
... but we need to resolve this
10:57:06 [mhellwig]
... examples, readiness information: what shall we do that?
10:57:24 [mhellwig]
... targetLanguages also more in the API.
10:57:33 [mhellwig]
Des: most of these information (on readiness) are project level
10:58:19 [mhellwig]
dF: place for using the t extension; this kind of information belongs to the project level
11:00:00 [mhellwig]
fsasaki: one point before we conclude the session. let's discuss over lunch
11:00:11 [mhellwig]
... then get back and discuss this
11:01:10 [mhellwig]
Olav: we need to define more clearly the state, the process. Are we defining the process, what actions we want donee with it. And that will define what's in Linport, in XLIFF.
11:01:54 [mhellwig]
Pedro: readiness is about what's to do. Not what has been done, that's provenance
11:02:27 [mhellwig]
dF: about readiness, there is too much in it. but I see value for part of it
11:02:41 [mhellwig]
... in a publisher - CMS workflow
11:03:22 [mhellwig]
Pedro: in the beginning Kimmo asked us why we want to join; and one answer was that we want to move from workflows to data driven processes
12:00:32 [Arle]
Arle has joined #mlwDub
12:07:52 [tadej]
tadej has joined #mlwDub
12:09:04 [mhellwig]
mhellwig has joined #mlwdub
12:10:13 [Yves_]
Yves_ has joined #mlwdub
12:10:30 [dF]
dF has joined #mlwdub
12:10:36 [fsasaki]
topic: provenance section with David Lewis
12:10:39 [fsasaki]
scribe: dF
12:10:43 [dF]
Scribe: dF
12:10:45 [RRSAgent]
I have made the request to generate fsasaki
12:11:18 [dF]
DaveL: Explains that provenance seems a general feature that appears in more areas
12:11:43 [dF]
Use cases are Localisation job monitoring
12:12:06 [dF]
Synchronizing Parallel Source revisions
12:12:25 [Yves_]
Yves_ has joined #mlwdub
12:12:30 [dF]
Low cost assembly of parallel text with some idea of quality
12:12:41 [RRSAgent]
I have made the request to generate fsasaki
12:12:58 [dF]
Distributed quality auditing
12:13:30 [dF]
in the sense of customer/provider quality data synchronization
12:13:48 [dF]
Dave was cechking a use case with Phil from Vistatec
12:14:16 [dF]
auditing,creates the need to synchronize QA reports
12:14:48 [dF]
Provenance appears in diffrent spaces.. No single killer use case
12:15:05 [dF]
There is a W3C Provenance WG
12:15:31 [dF]
as a W3C WG we should look at work of other WGs
12:15:54 [dF]
Provenance WG is revising their time schedule
12:16:12 [dF]
Current time line look slike Jan 2013
12:16:39 [dF]
Relating Provenance WG approach
12:17:01 [dF]
Fact are recorded about entites
12:17:17 [dF]
Agesnts can ac on entities
12:17:39 [Yves_]
12:17:44 [dF]
Entities can be created or transformed by activities performed by agents
12:17:51 [omstefanov_]
omstefanov_ has joined #mlwDub
12:18:22 [mlefranc]
mlefranc has joined #mlwdub
12:18:24 [dF]
Entities can be attributed to Agents
12:19:07 [dF]
Data model of the WG is PROV-DM
12:19:42 [dF]
They also have RDF/OWL and XML format (not well documented)
12:19:51 [dF]
also a query interface
12:20:33 [micha]
micha has joined #mlwdub
12:20:40 [dF]
ITS linking options
12:20:53 [dF]
you can use different granilarities
12:21:01 [dF]
is fairly flexible
12:21:15 [dF]
any number of agents can be associated with an entity
12:21:54 [dF]
Quite heavy and not at all intended as inline markup
12:22:16 [dF]
Take it as is?
12:22:30 [dF]
Question by Felix
12:23:49 [dF]
Felix: Can use existing values. Provenance record would be on document level and URI pointing to places
12:24:11 [dF]
Dave: With Dom, we have such implementation
12:24:23 [fsasaki]
felix: mechanism to use the existing values that are not ID attributes would be idValue, if we go for that, see
12:24:52 [dF]
we can do it in various palces, basically the same as we do its
12:25:23 [Yves_]
12:25:25 [dF]
In published document you do not really want to inlcude the fain grained provence info
12:26:26 [dF]
you can later derive segment relevant info from the higher level
12:26:44 [Yves_]
12:26:46 [dF]
There are options for RDF version
12:27:09 [dF]
We did a fine grained implementation based on hashes
12:27:28 [dF]
The hashes need to be recalculated if it changes
12:27:57 [dF]
Dave is asking Tadej about his implementation: Just one URL?
12:28:38 [dF]
Showing exmample on span level referencing a textual provenance store
12:29:16 [dF]
entity e1 was generated by activity a1
12:29:28 [dF]
at timestamp
12:29:43 [fsasaki]
example is also available at
12:29:48 [dF]
start and stop time
12:30:29 [dF]
As dF said earlier, we can provide best practice for other categories
12:30:47 [dF]
it would not be necessary a primary ITS category
12:31:38 [dF]
you can record language in provenance and you can use its tag for that
12:32:02 [dF]
provennce only tells you what happened, never what should happen
12:32:28 [dF]
reason to do provance is to record trustworthiness
12:33:00 [dF]
Felix: The general mechanism is very clear
12:33:54 [dF]
This needs to be finalized as best practice for others to use for their provenance related categories
12:34:35 [dF]
DaveL: The WG is approchable, they quite open, not too much grounded in industrial process
12:34:51 [dF]
Yves has question to a previous slide
12:35:09 [dF]
Would we need to process the txt version?
12:35:40 [dF]
DaveL: they have on top of txt, xml, RDF, and query interface
12:36:15 [dF]
Maxim: But their XML is just one possible serialization, it won't be compatoble with our XML
12:36:38 [dF]
RDF would be easier, they should provide parser
12:37:08 [dF]
Maxim: we can put lot of categories there
12:37:30 [dF]
DaveL: we need a few initial use case
12:37:46 [dF]
looking at exitsting SQL quality records
12:38:07 [dF]
we can chain different tools
12:38:37 [dF]
Jean: Who would own these resources
12:38:41 [dF]
12:39:20 [dF]
Dave: LSP record would contain the XLIFF etc process story
12:39:31 [dF]
who translated, who reviewed etc.
12:39:45 [dF]
It is valuable business inteligence
12:39:59 [dF]
You want to allocate blame
12:40:26 [Des]
Des has joined #mlwdub
12:40:28 [dF]
You get the answer quicker if everything is in one place
12:41:17 [dF]
we should have a checklist of categories that should be agreed between customer and provider
12:41:39 [dF]
there is a commercial tension
12:42:26 [dF]
Dag: Comment, one fairly nice way, not convinced that RDF is necessary, XML should be OK
12:42:37 [dF]
Info on MT processing should be embedded
12:42:43 [dF]
should not be too heavy
12:42:57 [dF]
there would be complexity in processing the link
12:43:05 [dF]
Dave: agrees
12:43:45 [dF]
with the second point (MT provenance embedded rather than linked)
12:44:07 [dF]
Tadej: Adding to the request for inline inclusion
12:44:53 [dF]
does it all relate to content? Also on markup? Or doesn't it matter?
12:45:17 [dF]
Dave: You create connection between content and a piece of metadata
12:45:34 [RRSAgent]
I have made the request to generate Yves_
12:45:37 [dF]
We are cerating a binding that is very valuable
12:46:03 [Yves_]
12:46:19 [dF]
Tadej: Provenance should contain the binding
12:46:40 [dF]
Pedro: It reminds me of QR code
12:46:53 [dF]
Can we use it?
12:47:12 [dF]
Olaf: The codeis just a representation
12:47:31 [dF]
[you still need the underlying categories]
12:48:12 [dF]
Dave: there an be altrenative to inline QA reporting, but as Dag points out, there is cost attached to it
12:48:47 [dF]
Dave: question for Felix. How about WGs working in parallel,legal, political?
12:49:08 [dF]
Rules for referencing os not more than two steps behind
12:49:48 [dF]
DaveL: Our requirement is simple. Informally it should be OK
12:50:31 [dF]
Givem there is the risk
12:50:45 [dF]
not that it wouldnot be possible to do
12:51:25 [dF]
Action Item: Felix to check with W3c on status of the Provenance group to manage the dependency risk
12:51:56 [dF]
Action Item: fsasaki to check with W3c on status of the Provenance group to manage the dependency risk
12:52:31 [dF]
Richard: they should be excited about the realworld use case we're bringing
12:52:44 [dF]
Felix goes to TPAC, maybe there
12:53:42 [dF]
Sebastian: question. Are there other similarly related categories in its?
12:54:06 [dF]
DaveL: we always say about scope, global or span
12:54:25 [dF]
It was not fully QAed
12:54:43 [dF]
Arle: We identified 3 levels:span, div, document
12:55:09 [dF]
Arle, DaveL: This is now outdated, needs revision
12:55:35 [dF]
Pedro: Tabular overview is out of date
12:55:54 [dF]
Sebastian wants to relate cetegories where it makes sense
12:56:04 [dF]
Felix: disagrees
12:56:36 [dF]
May make to combine them, but there are different use cases
12:56:37 [RRSAgent]
I have made the request to generate Yves_
12:56:53 [dF]
Interrelations should not be overloaded
12:57:23 [dF]
dependecnies would be too complex and would potentially prevent new usage scenarios
12:58:16 [dF]
Sebastian: If you do not sepcify enough, people would not know how to use
12:59:42 [dF]
Felix: no inline provenance in yet
13:00:08 [dF]
Phil was talking about qulaity records
13:00:20 [dF]
this is strictky not provenance
13:00:54 [dF]
DaveL; we need to specify use cases for inline and not
13:01:20 [dF]
Especially agents are reusable
13:02:12 [dF]
Who scribes this?
13:02:30 [dF]
I think we do not have trackbot
13:02:35 [dF]
How to invite?
13:02:52 [fsasaki]
scribe: fsasaki
13:03:25 [fsasaki]
topic: translation metadata
13:04:17 [fsasaki]
yves going through target pointer porposal
13:04:19 [trackbot]
trackbot has joined #mlwDub
13:04:19 [trackbot]
Sorry... I don't know anything about this channel
13:04:19 [trackbot]
If you want to associate this channel with an existing Tracker, please say 'trackbot, associate this channel with #channel' (where #channel is the name of default channel for the group)
13:04:25 [fsasaki]
not sure what the consensus currentl ist
13:05:04 [fsasaki]
dave: agree that this is a real use case
13:05:14 [dF]
trackbot, associate this channel with #channel #mlw-lt
13:05:14 [trackbot]
Associating this channel with #channel...
13:05:14 [trackbot]
Sorry... I don't know anything about this channel
13:05:14 [trackbot]
If you want to associate this channel with an existing Tracker, please say 'trackbot, associate this channel with #channel' (where #channel is the name of default channel for the group)
13:05:47 [dF]
trackbot, associate this channel with #mlw-lt
13:05:47 [trackbot]
Associating this channel with #mlw-lt...
13:06:35 [fsasaki]
yves: two proposals for implementations already, myself and shaun
13:08:53 [fsasaki]
richard: why is this needed?
13:08:59 [fsasaki]
yves explains the proposal again
13:09:08 [fsasaki]
des: who would generate and consume this?
13:09:25 [fsasaki]
yves: people who work with qt ts files
13:09:43 [fsasaki]
.. people who use XLIFF files, who don't have an XLIFF specific tool
13:09:54 [fsasaki]
des: I see it in an interchange format, but not in a resource format
13:10:16 [fsasaki]
yves: you can your rule to isolate one language as the source and target
13:10:43 [fsasaki]
.. there are quite a lot of resource file formats that have that information
13:10:53 [fsasaki]
13:11:01 [fsasaki]
des: how does target relate to target languages
13:11:18 [fsasaki]
yves: workflow related
13:11:23 [fsasaki]
.. the name is maybe not good
13:11:46 [fsasaki]
.. other data category: locale filter
13:12:58 [fsasaki]
.. indicate what needs to be translated to specific locales
13:13:36 [fsasaki]
.. would be a BCP 47 language code
13:14:36 [fsasaki]
alex: does it have to do with target language?
13:14:44 [fsasaki]
yves: it does, it is like a traditional translate
13:15:25 [fsasaki]
13:22:14 [dF]
Discussion on BCP 47
13:22:44 [dF]
Action Item: Felix to folow up on usage of BCP 47
13:22:44 [trackbot]
Sorry, couldn't find user - Item
13:23:01 [fsasaki]
action: shaun to flesh out locale proposal
13:23:01 [trackbot]
Created ACTION-107 - Flesh out locale proposal [on Shaun McCance - due 2012-06-20].
13:23:19 [dF]
Action Item: fsasaki to folow up on usage of BCP 47
13:23:19 [trackbot]
Sorry, couldn't find user - Item
13:24:12 [RRSAgent]
I have made the request to generate fsasaki
13:24:25 [dF]
It was not stillrecorded by trackbot..
13:25:57 [fsasaki]
13:31:56 [Tony]
Could look at XSLT xsl:strip-space and xsl:preserve-space elements:
13:33:26 [Arle]
13:34:24 [RRSAgent]
I have made the request to generate fsasaki
14:00:26 [mhellwig]
mhellwig has joined #mlwdub
14:01:11 [RRSAgent]
I have made the request to generate fsasaki
14:02:51 [RRSAgent]
I have made the request to generate fsasaki
14:03:47 [fsasaki]
scribe: Arle
14:03:52 [fsasaki]
continuation of session with Yves
14:03:56 [RRSAgent]
I have made the request to generate fsasaki
14:04:09 [Arle]
Yves: Next one is autoLanguageProcessingRule. It tells how the content should be translated, transliterated, MT OK, etc.
14:04:30 [Arle]
..I don't think there is any implementation commitment.
14:04:39 [Arle]
Felix: Thumbs down.
14:05:43 [dF]
dF has joined #mlwdub
14:06:22 [Arle]
Yves: Elements within Text is from 1.0. But there it is only a global category. I do not recall why we made that exception to the general case. Perhaps we did not find any exceptions. But after publishing it we got requirements for it. We have two possible implementations. ENLASO and (maybe) SDL. The only change is to add a local aspect to it. There is nothing new except you can specify on the element.
14:06:38 [Arle]
.. We should have two implementations
14:07:42 [Arle]
Olaf-Michael: Why don't we try in version 2.0 to allow all attributes to be local or global? There is no reason why it couldn't be at both levels.
14:08:03 [Arle]
Arle: I think there are some exceptions, like mtDisambiguation.
14:09:35 [Arle]
Felix: Besides local and global, there is a way to point to content. Should we allow all of them for any category? It turns out that it represents different philosophies. For some there is no usage scenario. For someone looking at a table of features, seeing one that is logically local listed as able to be implemented globally can be confusing.
14:09:58 [Arle]
.. There are two perspectives. One is for clean writing of the spec; one is for implementation.
14:10:23 [Arle]
Olaf-Michael: But if we limit it and we run into the issue where someone realizes a case later, then you have to modify the spec.
14:11:06 [Arle]
Yves: But you need implementations, and we had none. You don't want to force the implementer to implement something they think is useless. Theory is one thing.
14:11:15 [Arle]
,, This was the only exception any way.
14:11:30 [Arle]
s/,, /.. /
14:12:04 [Arle]
Dave: For an implementation for something that is both local and global, do the implementations have to address precedence properly?
14:12:42 [Arle]
Felix: Yes. They have to observe defaults. For example, if you see something that specifies globally that something is translatable, it has to respect the local override.
14:13:00 [Arle]
Dave: So you can't claim conformance by doing only local if it can be global?
14:13:25 [Arle]
Felix: You can do either. Let's look at conformance.
14:13:33 [Arle]
Richard: What was the use case?
14:13:55 [Arle]
Ingo: If you have an ITS processor that doesn't support XPath, you can still handle elements within text.
14:14:12 [Arle]
Yves: You could implement ITS with only local
14:14:21 [Arle]
Richard: Did you have customers who demanded this?
14:14:26 [Arle]
Ingo: Not at that time.
14:15:42 [Arle]
Felix: Conformance from ITS 1.0: You need to handle at least one selection mechanism (global or local), defaults, and correctly observe precedence.
14:16:05 [Arle]
.. ITS tools must also process Xlink hrefs in rules.
14:16:30 [Arle]
Pedro: In autoLanguageProcessRules, maybe transliteration is the only value that makes sense.
14:17:00 [Arle]
.. I think MT should be handled elsewhere.
14:18:04 [RRSAgent]
I have made the request to generate fsasaki
14:18:24 [leroy]
leroy has joined #mlwdub
14:18:36 [daveL]
daveL has joined #mlwdub
14:18:59 [fsasaki]
action: yves to update the automaticProcessingRule proposal with today's discussion
14:19:00 [trackbot]
Created ACTION-108 - Update the automaticProcessingRule proposal with today's discussion [on Yves Savourel - due 2012-06-20].
14:19:09 [fsasaki]
next context data category
14:20:24 [fsasaki]
yves: going through context data category, related to tbx term location
14:20:43 [fsasaki]
.. TBX and XLIFF seems to be similar
14:20:59 [fsasaki]
.. seems to be important
14:21:23 [fsasaki]
.. we killed format type because this is similar
14:21:39 [fsasaki]
.. call it "dave is much better with names" :)
14:22:11 [fsasaki]
alex: call this UI controls?
14:22:46 [fsasaki]
.. should one synchronise the names with microsoft related names?
14:23:49 [fsasaki]
des: this is very narrow
14:24:02 [fsasaki]
.. for describing context this is a very limited scope
14:24:38 [fsasaki]
yves: depending on the context value you get one translation or the other
14:24:50 [fsasaki]
.. so have clear values is important
14:25:51 [fsasaki]
richard: does this exist in XLIFF?
14:26:01 [fsasaki]
yves: yes, you have that in XLIFF; restype
14:26:59 [fsasaki]
felix: is XLIFF restype interoperable
14:27:09 [fsasaki]
yves: for the existing list yes, there are extensions with "x-"
14:27:38 [fsasaki]
.. could also be a namespace based approach
14:27:52 [fsasaki]
david: in software specialized tools that list is used
14:29:02 [fsasaki]
richard: where do XLIFF people get info for this list?
14:29:36 [fsasaki]
scribe did not get the answer
14:29:41 [dgroves]
dgroves has joined #mlwdub
14:30:21 [Arle]
Arle has joined #mlwDub
14:30:35 [fsasaki]
only "intentions", no real committments yet
14:31:37 [Arle]
David F: I might be able to have a PhD student work on this.
14:31:42 [fsasaki]
david will follow up on "context" with a student of his
14:32:00 [Arle]
Pedro: Let's rename this resourceType.
14:32:21 [Arle]
Yves: Next one is mtConfidence. Do we need the same thing for translations from other sources?
14:33:21 [Arle]
Arle: If you apply it to TM it sounds like fuzzy scores.
14:34:06 [Arle]
David F: Fuzzy scores are not interoperable. Fuzzy scores are basically random numbers. You agree on what a full match is, but nothing beyond that. I think that MT confidence is the same thing: a marketing number.
14:34:39 [Arle]
Pedro: There are five implementation intentions. It's not useful for RbMT, but for SMT it is useful. It is useful within a single tool.
14:34:52 [Arle]
Yves: Combine it with provenance and it is useful.
14:35:24 [Arle]
Declan: Confidence is a big issue right now. It's only valid within a single tool, e.g., benchmarking productivity improvements. We can provide that information, but we don't know how useful it is.
14:35:32 [Arle]
Pedro: Posteditors can use it.
14:35:55 [gderiard]
gderiard has joined #mlwDub
14:36:08 [Arle]
Michael-Olaf: The UN and EU, international organizations are working on objective quality measures for MT.
14:36:54 [Arle]
Sebastian: It would be valuable even without objective criteria because you can still assess relative values within a document, collection, etc. It lets you tell which is higher. It is useful for a developer. You don't really need objectivity.
14:37:55 [Arle]
Yves: If you look at MT from Google, Microsoft, it has two scores: one is confidence (based on human input, between 1 and 6). Other systems have other means (e.g., for post-edited, but marked as MT). Crowdsourcing might also be a use.
14:38:12 [Arle]
Dave: It is useful for post-editors to help them focus their works.
14:38:34 [Arle]
.. They don't work in a linear fashion. CAT tool feedback finds this useful.
14:39:27 [Arle]
Johann: It would help us know whether to post-edit or translate from scratch. Something too specific between 0 and 1 might be too limiting. We should account for the different needs.
14:40:40 [Arle]
Arle: The more I think about it the more I think it is general. E.g., Crowdsourced materials.
14:41:16 [Arle]
Tadej: You should prescribe a scale for humans for psychometric validity. A machine scale is different. We need to decide which to support.
14:42:05 [Arle]
David F: We are now looking too broadly beyond the intent. Don't expect interoperability, but within a document, it is valuable, as Sebastian noted.
14:42:20 [Arle]
Dave: Perhaps we put crowd assessment into the quality categories.
14:42:31 [Arle]
Johann: The numbers will vary depending on the desired outcome.
14:43:23 [Arle]
Yves: This one seems important and there is implementation desire. But no clear image of what to implement. So within the next two weeks we need to resolve it.
14:44:14 [Arle]
Action: David to come up with a proposal for mtConfidence within two weeks.
14:44:14 [trackbot]
Sorry, amibiguous username (more than one match) - David
14:44:14 [trackbot]
Try using a different identifier, such as family name or username (eg. dlewis6, dfilip)
14:44:34 [Arle]
Action: dfilip to come up with a proposal for mtConfidence within two weeks.
14:44:34 [trackbot]
Created ACTION-109 - Come up with a proposal for mtConfidence within two weeks. [on David Filip - due 2012-06-20].
14:45:14 [Arle]
Yves: We've seen interest in implementing specialRequirements.
14:45:15 [dgroves]
dF: we have a project at DCU around mtConfidence, so please include us in this dicussion
14:45:54 [Arle]
Yves: One problem is that this category is not fleshed out.
14:46:22 [Arle]
Richard: We already have the note category. The difference is that this is more structured and machine processable.
14:46:30 [Arle]
Pedro: There are two intentions for this.
14:46:47 [Arle]
Yves: We need someone to drive it and flesh it out and have a mid-July implementation example.
14:47:18 [Arle]
Felix: Remember you have a number of other ones to move forward. Please keep the priorities in order.
14:47:49 [Arle]
Yves: By mid-July we will drop things and you may waste effort, at least until the next version.
14:48:11 [Arle]
Action to Giuseppi to flesh out specialRequirements.
14:48:11 [trackbot]
Sorry, couldn't find user - to
14:48:27 [Arle]
Action: Giuseppi to flesh out specialRequirements.
14:48:27 [trackbot]
Sorry, couldn't find user - Giuseppi
14:48:56 [fsasaki]
14:49:05 [Arle]
Action: Pedro to have Giuseppe to flesh out specialRequirements.
14:49:06 [trackbot]
Created ACTION-110 - Have Giuseppe to flesh out specialRequirements. [on Pedro Luis Díez Orzas - due 2012-06-20].
14:49:07 [Yves_]
Yves_ has joined #mlwdub
14:49:54 [Arle]
Felix: Looking at the list of prospective categories, we need more information on some of them. We need to refine this list.
14:50:33 [Arle]
Otherwise, this is the list that will be in the public draft. It will be the ITS 1.0 draft with extended categories, plus mappings to RDFa and use in HTML5.
14:50:42 [Arle]
Dave: When does it have to be released?
14:51:01 [iprause]
iprause has joined #mlwDub
14:51:23 [Arle]
Felix: We should have published the draft last month. That's OK because we published the requirements doc. I said that we would publish these categories in the next draft in July.
14:51:38 [Arle]
Action: Felix to publish draft with final list of categories in July.
14:51:38 [trackbot]
Created ACTION-111 - Publish draft with final list of categories in July. [on Felix Sasaki - due 2012-06-20].
14:52:09 [Arle]
Felix: Data category holders, please remember to send out consensus emails.
14:53:24 [Yves_]
scribe: yves_
14:53:26 [RRSAgent]
I have made the request to generate fsasaki
14:53:34 [Yves_]
dF: will go through some issues only
14:54:02 [Yves_]
.. about relationship with XLIFF. happy that the importance is stressed out
14:54:28 [Yves_]
.. in our charter we have internal and external relationships
14:55:09 [Yves_]
.. dependencies are stronger than liaisons
14:55:25 [Yves_]
.. MLW-LT has one with XLIFF TC
14:55:46 [Yves_]
.. liaisons: with RDF Web Application WG
14:56:26 [Yves_]
.. we committed to a RDFa representation to foster integration of MLW in sementic web
14:57:52 [Yves_]
.. Liaison with ULI (UnicodeLocalization Interoperability group) chaired by Helena from IBM a TC of Unicode
14:58:15 [Yves_]
.. several MLW-LT members are active in ULI
14:58:28 [RRSAgent]
I have made the request to generate Yves_
14:58:47 [RRSAgent]
I have made the request to generate fsasaki
14:58:51 [Yves_]
ULI is looking at a segmentation character proposal
14:59:08 [Yves_]
.. on hold for now.
14:59:20 [Yves_]
.. will go back to ULI for re-work
15:00:10 [Yves_]
.. couple of characters to split or join segments
15:00:30 [Yves_]
.. many issues related to the proposal. e.g. intended for plain text
15:00:56 [Yves_]
.. may be to have a mapping to element, like BiDi control characters
15:01:36 [Yves_]
.... related to our group as well
15:01:59 [Yves_]
.. Also some discussion about the Unicode CLDR register
15:02:38 [Yves_]
Arle: need to assign an action for the registar
15:03:01 [Yves_]
Felix: didn't we decide to drop register?
15:03:11 [Yves_]
Arle: not if Pedro do it
15:03:50 [RRSAgent]
I have made the request to generate Yves_
15:04:05 [Yves_]
dF: initiating the discssion would be good
15:04:36 [Yves_]
.. We also have a liaison with ETSI ISG LIS
15:05:00 [Yves_]
.. LISA is gone
15:06:08 [Yves_]
.. OSCAR a LISA working group responsible for TMX, SRX, etc. was killed in the process
15:06:33 [Yves_]
.. ESGI is a Telecom standard body
15:06:39 [Yves_]
.. not very transparent
15:07:03 [Yves_]
.. I would like to have a formal liason between MLW-LT and ESGI
15:07:43 [Yves_]
.. Arle is member, could be the liaison agent
15:08:22 [Yves_]
s/w /we /
15:08:35 [Yves_]
.. w don't want to be surprised by a new standard
15:09:24 [Yves_]
Arle: names and logos of OSCAR standards are protected, eveything else is under Creative Commons
15:10:10 [Yves_]
dF: some LISA standards were co-owned with ISO, so more visible.
15:10:20 [Yves_]
.. but SRX, TMX, etc. are not
15:10:58 [Yves_]
.. personnaly: thinks TMX 1.4b is too old to catch up with modern data
15:11:23 [Yves_]
.. and share a lot
15:11:45 [Yves_]
.. potentially TMX and XLIFF share inline elements
15:12:35 [Yves_]
.. Would propose to Arle to be a representative with ESGI
15:12:46 [Yves_]
felix: ESGI is not in the charater
15:12:55 [Yves_]
15:13:31 [Yves_]
Felix: liaison are weak connections, much less important than dependencies
15:13:58 [Yves_]
dF: i think ESGI is very closed so having a liaison with it may be good.
15:14:23 [Yves_]
Pedro: in 10 day we hace ISO TC 37 will have a meeting in Madrid
15:14:44 [Yves_]
.. MLW will be there and offer refreshement and a 10mn presentation
15:15:02 [Yves_]
.. I'm asking for input, ideas, etc.
15:15:24 [Yves_]
Arle: Alan will be there too.
15:15:41 [Arle]
Action: Arle to interface with Pedro on presentation to TC37 by June 20.
15:15:41 [trackbot]
Created ACTION-112 - Interface with Pedro on presentation to TC37 by June 20. [on Arle Lommel - due 2012-06-20].
15:16:21 [Yves_]
Felix: maybe Pedro can show our agreed-upen list of data categories to get feedback
15:16:29 [Yves_]
.. as well as any extra ones
15:16:35 [Arle]
15:16:50 [Arle]
15:16:57 [Yves_]
Olaf-Michael: Alan melby will be in Mardrid. You should coordinate with him too.
15:17:17 [Arle]
15:17:40 [Yves_]
Felix: about ETSI liaison: what kind of info would we provide and get
15:17:59 [Yves_]
dF: reports/updates from Arle on the ETSI activities
15:18:10 [Yves_]
Arle: yes I could provide updates
15:18:53 [Yves_]
dF: need the liaison because ETSI has no public mechanism to feedback
15:19:09 [Yves_]
dF: also SRX may be interesting
15:19:57 [Yves_]
David moves to nominate Arle as Liaison with ETSI for info exchange
15:20:05 [fsasaki]
15:20:18 [fsasaki]
15:20:30 [Yves_]
Felix senconds
15:20:47 [Yves_]
15:20:49 [jr]
jr has joined #mlwdub
15:20:51 [Yves_]
no dissent
15:21:06 [Yves_]
dF: About XLIFF
15:21:17 [Yves_]
.. currently TC is working on 2.0
15:21:46 [Yves_]
.. e.g ballot about allowance or not of custom namespaces.
15:22:01 [Yves_]
.. to me it's a future-proofing measure
15:22:33 [Yves_]
.. But people against have good reasons for being worry too because extensions were abused in 1.2
15:23:25 [Yves_]
.. 2.0 will have conformance statements against abuses
15:23:37 [RRSAgent]
I have made the request to generate Yves_
15:24:46 [Yves_]
.. you can join the XLIFF Tc easily
15:25:06 [Yves_]
much simpler and cheaper than W3C, and also can interact with the comments list
15:25:25 [Yves_]
.. OASIS are transparent as in W3C
15:25:57 [Yves_]
DaveL: time line about XLIFF 2.0
15:26:35 [Yves_]
dF: Twice as many member since last year
15:26:42 [Yves_]
no calendard dates set up
15:26:47 [Yves_]
for now
15:27:24 [Yves_]
.. times are different now. My perception is that important players are in now
15:27:51 [Yves_]
Bryan: failure in draft not ready in 2013
15:28:26 [Yves_]
dF: some move toward a first draft. 3rd XLIFF symposium in Seattle in October
15:28:59 [Yves_]
.. core should be close to completion by October
15:29:33 [RRSAgent]
I have made the request to generate Yves_
15:30:11 [Yves_]
Session closed
15:30:23 [Yves_]
Arle: arle and Pedro will be the two member in Madrid for TC 37
15:30:29 [RRSAgent]
I have made the request to generate Yves_
15:31:43 [fsasaki]
for upcoming WG calls, see
15:31:52 [Yves_]
Events: LocWord, 15-16-17 Oct
15:32:07 [Yves_]
Prague F2F 25-26 Nov
15:32:23 [RRSAgent]
I have made the request to generate Yves_
15:32:30 [dF]
dF has joined #mlwdub
15:32:55 [Yves_]
Thanks to CNGL for the support for this Workshop
15:33:27 [Yves_]
thanks to Trinity College for the support
15:34:08 [Yves_]
Thanks for Eithne for the support
15:34:23 [Yves_]
thanks to the sessions leaders and scribes
15:34:36 [Yves_]
Thanks for Leroy for the filming
15:35:02 [Yves_]
Thanks to dotNet: they will provide captions for the video
15:36:08 [Yves__]
Yves__ has joined #mlwdub
15:36:30 [RRSAgent]
I have made the request to generate Yves__
15:37:42 [gderiard]
gderiard has joined #mlwDub
15:39:23 [fsasaki]
rrsagent, bye
15:39:23 [RRSAgent]
I see 17 open action items saved in :
15:39:23 [RRSAgent]
ACTION: Tadej to Write proposal how mapping between NIF and HTML+ITS would look like with concrete examples [1]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: Felix to Draft email to XLIFF committee about improving extensibility [due 2012-06-15] [2]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: Jirka to arrange F2F meeting in September at UEP [3]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: Felix to Summarize discussion around Domain [4]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: Des to write example for how to deal with invalid cache (re: localisationCache) [5]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: Item to Felix to check with W3c on status of the Provenance group to manage the dependency risk [6]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: Item to fsasaki to check with W3c on status of the Provenance group to manage the dependency risk [7]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: Item to Felix to folow up on usage of BCP 47 [8]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: shaun to flesh out locale proposal [9]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: Item to fsasaki to folow up on usage of BCP 47 [10]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: yves to update the automaticProcessingRule proposal with today's discussion [11]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: David to come up with a proposal for mtConfidence within two weeks. [12]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: dfilip to come up with a proposal for mtConfidence within two weeks. [13]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: Giuseppi to flesh out specialRequirements. [14]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: Pedro to have Giuseppe to flesh out specialRequirements. [15]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: Felix to publish draft with final list of categories in July. [16]
15:39:23 [RRSAgent]
recorded in
15:39:23 [RRSAgent]
ACTION: Arle to interface with Pedro on presentation to TC37 by June 20. [17]
15:39:23 [RRSAgent]
recorded in