IRC log of mlwDub on 2012-06-12

Timestamps are in UTC.

07:40:04 [RRSAgent]
RRSAgent has joined #mlwDub
07:40:04 [RRSAgent]
logging to
07:44:51 [Yves_]
Meeting: MultilingualWeb-LT Workshop
07:46:46 [Yves_]
07:48:31 [Yves_]
Scribe: Yves
07:48:35 [fsasaki]
fsasaki has joined #mlwdub
07:48:38 [Yves_]
ScribeNick: Yves_
07:48:55 [RRSAgent]
I have made the request to generate fsasaki
07:49:17 [daveL]
daveL has joined #mlwDub
07:49:36 [fsasaki]
present: many, manyy, manyyy, people
07:49:39 [fsasaki]
chair: Arle
07:49:45 [fsasaki]
scribe: Yves_
07:49:57 [fsasaki]
topic: intro
07:50:00 [fsasaki]
meeting not started yet
07:50:04 [RRSAgent]
I have made the request to generate fsasaki
07:52:21 [RRSAgent]
I have made the request to generate fsasaki
07:53:13 [fsasaki]
fsasaki has joined #mlwdub
07:53:28 [fsasaki]
07:53:31 [fsasaki]
07:53:43 [philr]
philr has joined #mlwdub
07:55:31 [Milan]
Milan has joined #mlwDub
08:04:51 [leroy]
leroy has joined #mlwDub
08:05:05 [Milan]
Milan has joined #mlwdub
08:05:28 [Arle]
Arle has joined #mlwdub
08:07:29 [Yves_]
felix: today we'll work on the ITS2.0 requirements
08:07:40 [Yves_]
.. first let's introduce ourselves
08:08:09 [tadej]
tadej has joined #mlwDub
08:08:19 [mhellwig]
mhellwig has joined #mlwdub
08:08:33 [Pedro]
Pedro has joined #mlwDub
08:08:34 [moran]
moran has joined #mlwdub
08:08:45 [Rob]
Rob has joined #mlwdub
08:08:49 [Sebastian]
Sebastian has joined #mlwdub
08:08:56 [Bryans]
Bryans has joined #mlwdub
08:08:59 [omstefanov]
omstefanov has joined #mlwdub
08:09:05 [XavierMaza_GALA]
XavierMaza_GALA has joined #mlwdub
08:09:06 [nico]
nico has joined #mlwDub
08:09:56 [philr]
Phil Ritchie, CTO at VistaTEC. Industrial Partners in CNGL and MLW-LT. Within MLW-LT interested in the encapsulation of linguistic quality information and provenance within metadata.
08:10:33 [ChrisLyons]
ChrisLyons has joined #mlwDub
08:10:55 [peter]
peter has joined #mlwdub
08:11:32 [jakob]
jakob has joined #mlwdub
08:12:05 [Des]
Des has joined #mlwdub
08:15:37 [thomas]
thomas has joined #mlwDub
08:16:42 [RRSAgent]
I have made the request to generate Yves_
08:17:17 [Sebastian]
Sebastian Sklarss, Interoperability and Open Data Consultant at medium-sized privately owned company ]init[ ( My collegue Horst Kraemer will join later. Interested in implementing ITS in our customers' CMS
08:17:24 [XavierMaza_GALA]
Xavier Maza, Language Services Manager at iDISC and GALA (Globalization and Localization Association) board member, interested in hearing from you to take back to our membership.
08:17:54 [Yves_]
Felix: ITS 2.0 was started at the beginning of this year
08:18:34 [Yves_]
... some people may not know ITS 1.0 very well, so I'll try to summarize it
08:19:02 [Yves_]
.. ITS defines "data categories"
08:19:44 [Yves_]
.. they are separated item (not necessarily related), allowing flexibility
08:20:00 [Yves_]
.. we provide non-application-specific definitions
08:20:18 [Yves_]
.. it's ok to implement only some data categories, not all
08:20:53 [Yves_]
.. for example the Translate data category
08:21:17 [Yves_]
.. you can express it locally (its:translate on an element)
08:21:45 [Yves_]
.. HTML5 also implement that data category: the 'translate' attribute
08:22:11 [Yves_]
.. it's easy to map the implementations
08:22:23 [Tony]
Tony has joined #mlwdub
08:22:30 [Yves_]
.. In addition ITS offers the "global" approach
08:23:11 [Yves_]
.. ITS 1.0 offers global rules using XPath selectors that select to which nodes the data applies
08:23:42 [dgroves]
dgroves has joined #mlwdub
08:24:05 [r12a]
r12a has joined #mlwdub
08:24:17 [Yves_]
.. you can compare this to CSS: defaults, rules in files, rules in the document itself, and locally as well.
08:24:42 [Yves_]
.. In ITS 2.0 we want to apply ITS in HTML5, CMS content, etc.
08:25:08 [Yves_]
.. we want also to have some bridges to the semantic web
08:25:31 [RRSAgent]
I have made the request to generate Yves_
08:26:53 [Yves_]
Felix shows Richard's test for the HTML5 translate attribute using different systems.
08:28:34 [r12a]
08:28:47 [Yves_]
Felix: ITS 1.0 has 7 data categories, focused on XML
08:29:08 [r12a]
note that these test results need updating - last week i found out that MS now produces positive results for all tests
08:29:36 [dF]
dF has joined #mlwDub
08:29:50 [r12a]
ie. <span translate=no>.....<span translate=yes>..... </span> ....</span> now works
08:33:52 [Yves_]
Yves_ has joined #mlwdub
08:34:00 [Arle]
ITS 2.0 Disambiguation would allow linking to thesauri, etc. for MT.
08:34:07 [Yves_]
scribeNick: yves_
08:34:21 [Arle]
s/ITS 2.0/Felix: ITS 2.0/
08:35:08 [Yves_]
Felix: other data categories: text analysis annotation
08:36:37 [Arle]
David Filip: How is ITS 1.0 term different from disambiguation?
08:36:56 [Yves__]
Yves__ has joined #mlwdub
08:37:35 [Arle]
Felix: term is not application specific, is a general item. disambiguation data is specific to this and ties to resources specifically for the purpose of disambiguation. But we need to discuss these details to finalize our work.
08:37:54 [Arle]
.. We need disambiguation in other areas, so this is designed for that purpose.
08:38:29 [Arle]
Richard: There were some other categories in ITS 1.0 you didn't show in the ITS 2.0 slide. Will they be dropped?
08:39:40 [Arle]
Felix: It was just that nobody showed interest in working on them here. But because the data categories are independent, we don't have to deal with them. They may be handled elsewhere. But in any event, we keep ITS 1.0 categories. We may point to them somewhere else or develop them further.
08:40:24 [Yves_]
Yves_ has joined #mlwDub
08:40:24 [Arle]
Richard: I'm worried that we might lose important things like directionality. It is useful for people using XML to have guidance. We don't want to drop them.
08:40:58 [Arle]
Felix: The list I showed is take from the ToC of ITS 1.0. ITS 2.0 will contain all of them and then add more. So all of them will remain in ITS.
08:41:12 [RRSAgent]
I have made the request to generate Yves_
08:41:28 [Arle]
.. We will give guidance for what to do, but the actual specification may point to work in another working group, but we don't drop it.
08:41:40 [Arle]
.. E.g., the HTML5 working group might define parts of these.
08:42:09 [Arle]
Dave Lewis: The lists you showed are snapshots from today. All it means is that there was discussion about some points and others. It's where we are today, but it can be changed.
08:43:19 [Arle]
Olaf-Michael Stefanov: Just because something form ITS 1.0 is not on the ITS 2.0 list does not mean it will be dropped. Just because our group doesn't implement does not mean we can't refer to other specifications for those points.
08:43:39 [Arle]
s/Olaf-Michael Stefanov/Felix/
08:44:12 [Arle]
Felix: We need to get to concrete details to find consensus and implementation commitments.
08:44:32 [Arle]
.. We need to decide how to implement these categories in various formats.
08:44:52 [Arle]
.. We have consensus on a small set of ideas, but ideas for others, so we need to come to consensus.
08:45:44 [Arle]
.. Our time-frame is that we need the general framework by the end of July. That does not mean all details need to be sorted out, but we need to have the list nailed down, with a list of what is to "be in the basket". We need a draft by October and a stable draft by November.
08:46:03 [RRSAgent]
I have made the request to generate Yves_
08:47:12 [Arle]
.. The group is funded by the EU. We need to be as implementation-driven as possible. E.g., the translate data category really helps convey the message about what can be done and also shows issues. If you follow Richard's test, it shows issues with nesting of different translate states. That is not handled yet. By prototyping simple categories we can tell what is feasible.
08:47:55 [Arle]
.. For participants, please think about what you really want to work on before the summer break. Do it for at least HTML 5 and XML, using both local and global (XPath) markup.
08:48:09 [Arle]
.. Also, engage customers to see what they want to do. Use real-world testing.
08:48:49 [Arle]
.. It is a chaotic process. Start with playing with stuff to see what works. When I say play and prototyping, those outside the group might ask what sorts of implementations might be produced.
08:49:40 [RRSAgent]
I have made the request to generate Yves_
08:49:53 [Arle]
.. Got the MLW-LT homepage and see what deliverables are needed. It shows the areas where we expect to see implementations, e.g., Drupal for CMS by Cocomore, annotation by Tadej, in MT (Linguaserve, DCU), annotation of MT data, quality (Phil), etc.
08:50:19 [Arle]
.. At the end of the day we need stable implementations created using the EU funding.
08:50:29 [Arle]
.. We need more implementation experience.
08:51:04 [Arle]
Richard: Microsoft Translator does now support nested translate attributes properly. I just tested it and it worked.
08:51:06 [r12a]
08:52:01 [Arle]
Felix: The purpose is to get your ideas and commitments. Look at the requirements document for the ones where we have consensus. We need to understand what is important for which community. We also want to see how you can make money by seeing the value for the cost of changing to use these.
08:52:18 [Arle]
.. Find where it makes the most sense/value. We need business case-level arguments.
08:53:07 [Arle]
.. The group is moving forward a lot. The chairs and participants provide some pointers for the discussion aligned with the sessions. Use the mail you got to help guide the discussion.
08:53:29 [Arle]
Richard: What's the process for stating that you like a category and deciding whether it is in or out?
08:53:46 [RRSAgent]
I have made the request to generate Yves_
08:54:10 [Arle]
Felix: Join the IRC and when we discuss the categories and implementation, mention your support and concerns there. After the meeting we will analyze the comments to see what we need to take into account, what people supported.
08:54:14 [mlefranc]
mlefranc has joined #mlwdub
08:54:54 [Yves_]
Yves_ has joined #mlwDub
08:54:58 [Arle]
Topic: Work session on representation formats
08:56:17 [Arle]
Jirka: Maxime will talk about issues and then I will discuss other issues.
08:56:25 [Arle]
Topic: Maxime's presentation
08:56:33 [iprause]
iprause has joined #mlwDub
08:57:44 [fsasaki]
scribe: fsasaki
08:57:49 [micha]
micha has joined #mlwdub
08:58:17 [PaulMac]
PaulMac has joined #mlwDub
08:58:29 [Jirka]
Jirka has joined #mlwDub
08:58:43 [kurzum]
kurzum has joined #mlwdub
08:58:55 [fsasaki]
maxime: RDFa representation format - drop as a requirement?
08:59:06 [fsasaki]
.. no, since RDFa mapping of data categories is in the working group charter
08:59:49 [fsasaki]
.. different conceptualization: RDFa is for statements embedded in HTML, ITS is about a specific piece of content
09:00:12 [fsasaki]
.. Sebastian Hellmann proposed the NIF format to have context based URIs
09:00:40 [fsasaki]
.. two approaches to generate URIs: hash based or XPointer based
09:01:04 [r12a]
i guess a link to the requirements doc would be useful:
09:02:22 [fsasaki]
.. comparison of what can be selected with NIF, CSS seleectors, XPath 1.0 / 2.0, XPointer
09:02:24 [iprause]
iprause has joined #mlwDub
09:02:55 [RRSAgent]
I have made the request to generate fsasaki
09:04:21 [fsasaki]
maxime: XPointer 1.0 has a small extension to XPath that is hard to implement
09:04:41 [fsasaki]
.. big issue in RDFa - how to deal with inheritance and overriding
09:05:03 [fsasaki]
.. probably be out of scope for us
09:05:13 [fsasaki]
.. CURIEs - use URIs with less verbosity
09:05:37 [Arle]
I have a concern about the statement about MUST support. It runs into compliance issues for us since you only need to implement one data category in one format to claim compliance. If you are interested only in implementing translate in HTML5, this proposal would seem to require you to support stuff you don't care about or actually need.
09:06:00 [fsasaki]
Arle, where is MUST written? In Maxime's presentation?
09:06:07 [Arle]
09:06:48 [Pedro]
Pedro has joined #mlwDub
09:07:02 [fsasaki]
maxime: consumers of ITS could use CURIEs to shorten URIs
09:07:18 [fsasaki]
.. inspiration from Provenance wg: they deal with XML and RDF at the same time
09:07:35 [fsasaki]
.. PROV data model, PROV ontology,
09:07:43 [fsasaki]
.. suggestion would be to have multiple facets
09:07:57 [Arle]
Slide 11: "ITS 2.0 implementations MUST implement XPointer“
09:08:03 [fsasaki]
.. ITS data model, ITS-XML, ITS-O (Ontology with mapping ITS to RDF)
09:08:11 [fsasaki]
.. ITS-HTML, its-* attributes
09:08:15 [fsasaki]
09:08:21 [fsasaki]
.. ITS-HTML-Microdata
09:09:12 [fsasaki]
maxime: provenance model relates agents and activities
09:09:25 [fsasaki]
.. e.g. "translator leads LT-activities on fragments of text"
09:09:48 [fsasaki]
.. suggestion to define prov:Organization, prov:Person, prov:SoftwareAgent, ...
09:10:21 [fsasaki]
.. as agents; activities are human translation, machine translation, quality assessment
09:10:37 [RRSAgent]
I have made the request to generate fsasaki
09:12:28 [fsasaki]
maxime: issues of local ITS annotation: can't express complex set of ITS attribute, e.g. its-* elements
09:12:44 [fsasaki]
.. possible solutions: wirte directly ITS-XML in a script element or a head element
09:13:50 [fsasaki]
.. finally - about ITS namespace, should it be kept?
09:14:18 [daveL]
TCD's CMS LION implementation and its use of provenance documented in recent LREc paper:
09:14:34 [fsasaki]
.. can be kept if you use ITS with content negotation, example with SKOS that is always redirected to latest version of a schema
09:14:58 [fsasaki]
.. for us, we would have a URI of ITS 2.0 specification
09:15:33 [fsasaki]
.. and content neg for various schemas
09:17:18 [Yalemisew]
Yalemisew has joined #mlwdub
09:17:29 [gderiard]
gderiard has joined #mlwdub
09:17:46 [Pedro]
q+ The presentation will be available in the wiki?
09:18:42 [fsasaki]
09:18:48 [fsasaki]
pedro, yes, it will be in the wiki
09:19:26 [mlefranc]
to [11:05] <Arle> , I agree, that would only be needed for data categories that have CURIEs as Datatype
09:20:14 [fsasaki]
jirka is describing how representation is done so far for XML in ITS 1.0 and HTML5
09:21:08 [RRSAgent]
I have made the request to generate fsasaki
09:21:28 [fsasaki]
jirka: after discussion with HTML working group, we decided to use its- attributes
09:21:45 [fsasaki]
q+ to mention importance of cooperation with other WGs wrt representation of ITS
09:22:42 [Arle]
maxime, thanks for clarifying. That wasn't necessarily clear from the slide, where it looked like a general requirement for ITS 2.0 conformance. So the proposal would need to be hedged a bit.
09:23:01 [Arle]
Preliminary slides page URL will be:
09:23:16 [fsasaki]
thanks a lot, Arle
09:23:45 [fsasaki]
jirka showing how global rules can be linked from an HTML document
09:24:13 [fsasaki]
jirka showing microdata mapping from its- attributes
09:25:31 [fsasaki]
jirka: meta element cannot be used everywhere in HTML, that might be an issue for microdata mapping of ITS
09:25:59 [fsasaki]
.. RDFa is most problematic - jirka is propose to kill that proposal
09:26:36 [fsasaki]
.. maxime had some ideas to do that; now I am worried about this mapping
09:27:12 [fsasaki]
.. RDFa is another syntax how to express RDF triples
09:27:34 [fsasaki]
.. so the subject of the tripel is the whole page, not the original piece of content
09:28:49 [fsasaki]
.. we promised a mapping to RDFa, so I'd be happy if we have a proposal to work with NIF
09:29:19 [fsasaki]
.. maxime: agree that people from the SW area need to have ideas
09:30:06 [fsasaki]
action: maxime to lead discussion on RDF serialization in ITS, with "task force" people - Sebastian, Maxime, Dave, ...
09:30:39 [r12a]
09:30:47 [fsasaki]
dave: is there a use case for RDFa expressing ITS?
09:31:11 [fsasaki]
tadej: i can provide the data, but if nobody is consuming it that's a problem
09:31:42 [fsasaki]
.. text analytics software provides info often via URIs
09:31:49 [fsasaki]
.. that can be expressed without RDFa
09:32:06 [fsasaki]
.. so we can provide RDFa easily, but does it make sense to use it
09:32:48 [fsasaki]
.. NIF serves the issues with RDFa
09:33:40 [fsasaki]
davidF: maxime prepared great stuff
09:33:46 [fsasaki]
.. but I would aks for a use case
09:34:51 [fsasaki]
davidF: maybe a clarification of the charter would help
09:37:11 [fsasaki]
action: Felix to work on charter clarification
09:38:02 [fsasaki]
maxime: we could use ITS in RDF to localize ontologies
09:38:45 [fsasaki]
dave: are we taking localization of ontologies as a requirement on board?
09:40:03 [fsasaki]
.. also, provenance that we are working on (RDF based) does not require RDFa
09:40:36 [fsasaki]
.. question is really if we have a use case for generating an RDF graph from the content
09:41:10 [RRSAgent]
I have made the request to generate fsasaki
09:41:53 [fsasaki]
jirka: other issues - how to represent in former versions of HTML
09:41:58 [fsasaki]
.. e.g. HTML 4 or HTML 3.2
09:42:16 [fsasaki]
.. I think we don't need to provide that - even in HTML 3 or HTML 4.2 you can use the its- attribute
09:42:40 [fsasaki]
action: jirka to make a clarification in the req draft about previous versions of HTML
09:43:38 [fsasaki]
above is ISSUE-19
09:43:54 [fsasaki]
jirka: XPointer - is stil a working draft, will probably not be finished in time
09:44:00 [RRSAgent]
I have made the request to generate fsasaki
09:44:08 [fsasaki]
09:44:12 [fsasaki]
ack f
09:44:12 [Zakim]
fsasaki, you wanted to mention importance of cooperation with other WGs wrt representation of ITS
09:44:46 [fsasaki]
maxime: we could use XPointer without the extension of range
09:44:55 [fsasaki]
.. we would have the benefits of URIs
09:45:05 [fsasaki]
jira: so in a selector a URI that can use XPointer fragment?
09:45:16 [fsasaki]
maxime: yes
09:45:35 [fsasaki]
jirka: that will be used in global rules?
09:45:49 [fsasaki]
maxime: yes
09:46:00 [fsasaki]
jirka: will be current inconvinient for current ITS usage
09:46:10 [fsasaki]
.. currently you say "all titles in all chapters" via XPath
09:46:29 [fsasaki]
.. with XPointer you will just be about a particular document identified by the URI
09:46:40 [fsasaki]
.. so there might be no real use case to switch to XPointer
09:46:47 [fsasaki]
.. so I'd propose not to use the idea now
09:47:16 [fsasaki]
davidF: if XPointer spec will not be finished, we cannot do this
09:49:56 [fsasaki]
jirka: maxime was proposing to publish schemas with content negotiation
09:50:25 [fsasaki]
.. these techniques are controversial also in SW - need to provide things that run more automatic over HTTP
09:51:07 [fsasaki]
davidF: like the idea of content negotation
09:51:51 [fsasaki]
maxime: not for DTD or XSD, but other areas it might be relevant
09:52:17 [fsasaki]
felix: at dF, the content negotiation is about the schemas
09:52:52 [fsasaki]
.. so probably a different case than df has in mind
09:53:00 [fsasaki]
sebastian: you have an ontology in a sense
09:53:21 [fsasaki]
jirka: I mean ontology in terms OWL
09:53:29 [fsasaki]
sebastian: can be easily created
09:53:55 [fsasaki]
felix: is there a use case for the ontology?
09:53:59 [fsasaki]
sebastian: not sure
09:54:06 [fsasaki]
provenance issue will be postponed
10:15:09 [Pedro]
Pedro has joined #mlwDub
10:18:45 [Rob]
Rob has joined #mlwdub
10:20:04 [dF]
dF has joined #mlwdub
10:21:32 [Pedro]
Working Session: Quality Metadata
10:26:35 [fsasaki]
topic: quality metadata
10:26:52 [omstefanov]
omstefanov has joined #mlwdub
10:27:36 [fsasaki]
for more info on this session, see and
10:28:07 [Pedro]
Phil: Their interest is language quality and QA. Language review process can be very complex.
10:28:30 [Pedro]
.. This is an opportunity to see new approaches and solutions.
10:29:18 [Pedro]
.. Additional support of the audience will be very interesting for the implementations of these data categories.
10:29:49 [Yalemisew]
Yalemisew has joined #mlwdub
10:30:16 [Pedro]
Arle: The target audience for Quality Metadata are: LSP doing QA, Content Creators doind quality verification,
10:30:52 [Pedro]
.. Authors marking errors and posteditors providing info on efficiency / MT quality.
10:31:46 [Pedro]
.. Motivations: 85% of QA is spent on about 10% of content.
10:31:54 [mlefranc]
mlefranc has joined #mlwdub
10:32:01 [Pedro]
.. so there are pontential cost savings.
10:32:42 [Pedro]
.. to capture sistematically the problems you do not know where they come from. Provenace will help.
10:33:04 [fsasaki]
q+ domain / purpose values
10:33:08 [fsasaki]
10:33:14 [fsasaki]
q+ to question domain / purpose values
10:33:29 [fsasaki]
q- domain
10:33:33 [Pedro]
... Some other data categories (purpose, domain...) can help to build business rules
10:33:34 [fsasaki]
q- /
10:33:38 [fsasaki]
q- purpose
10:33:41 [fsasaki]
q- values
10:33:49 [fsasaki]
10:33:56 [fsasaki]
q+ to ask about domain / purpose values
10:34:06 [Pedro]
Arle: Two complex Data Categories: errors and error profiles
10:34:10 [dgroves]
dgroves has joined #mlwdub
10:34:42 [Pedro]
.. the point is to have DC independently of the metric
10:34:45 [gderiard]
gderiard has joined #mlwdub
10:34:51 [daveL]
Quality data categories in requirements doc at:
10:35:44 [Pedro]
.. the data model can be very simple, but some examples seem to need more complex attributes
10:36:13 [fsasaki]
q+ to ask about why qa-note
10:36:18 [Pedro]
.. a simple sintax can refer to the basic error parameters.
10:37:12 [Pedro]
Richard: too many attributes and info in that model
10:37:23 [Pedro]
ARle: yes, this is one of the issues
10:37:58 [Pedro]
.. an standard markup will be maybe a better solutions, but needs to be resolved.
10:38:31 [fsasaki]
dag: where would that be? In the content?
10:38:36 [fsasaki]
arle: in the localization content
10:38:46 [fsasaki]
dag: so in XLIFF?
10:39:21 [Pedro]
Arle: Xliff can be an ideal scenario, but QA can be done in other enviroments, processes, etc.
10:39:42 [fsasaki]
phil: we were looking into using RDF and putting that into a triple store
10:40:03 [fsasaki]
.. we want to use RDF to do cross silo linking
10:40:06 [Pedro]
Phil: Publishers need also to capture feedback of QA process.
10:40:20 [fsasaki]
10:41:41 [Pedro]
Felix: in localisation of software mechanisms of status os process are necessary
10:42:22 [Pedro]
Dave: Need to carefully establish the scope
10:44:04 [Pedro]
Felix: we need to have implementation for this markup
10:45:07 [Pedro]
.. the botton line are the implementation of the Data Categories to see them in real apps.
10:46:32 [mlefranc]
quality error description with RDF using Provenance datamodel: [ a its:qualityErrorDescription ; its:qaType "..." ; ... ; prov:wasGeneratedBy [ a its:QAActivity ; ... ; prov:used [ a str:String ; str:anchorOf "verbs agrees"]]]
10:46:34 [Pedro]
Dave: contradiction between the scope and the way of representing only one QA run
10:47:18 [Pedro]
Richard: maybe better not to use span
10:47:34 [micha]
micha has joined #mlwdub
10:47:42 [Pedro]
ARle: a dedicated element can be used, but that gives other problems
10:48:34 [Yves_]
Yves_ has joined #mlwDub
10:50:14 [r12a]
s/better not to use span/better to use a dedicated element rather than span, since it makes it easier to keep separate from the content/
10:50:15 [Pedro]
Tatiana: target users can be also MT training and development
10:51:37 [Pedro]
Arle: additional specifier can be necessary in terms of recognition
10:51:49 [Pedro]
.. more slides
10:51:58 [fsasaki]
10:52:32 [Pedro]
.. profile must be flexible and capable to be used in a global manner
10:53:32 [fsasaki]
(below action is unrelated to this discussion, just so that I don't forget)
10:53:33 [fsasaki]
action: felix to check whether we can use META-SHARE for identifying resources to be used in disambiguation
10:53:45 [Pedro]
.. ex. qualityProfile
10:54:25 [Pedro]
Dave: do you mean by "pass" the result of the QA?
10:55:09 [Pedro]
ARle: Yes, but it is more intended to show what was done as QA
10:56:11 [Pedro]
Phil: you can define errors with a high granularity, but also some scores for more important errors.
10:56:37 [Pedro]
Arle: This is a very verbose markup
10:57:47 [Pedro]
.. implementations need a big effort, so mechanism to know how is been done now are necessary
10:58:28 [Pedro]
.. and it will affect the commitments and timeframe
10:59:13 [Pedro]
.. it is out of scope to standardrise the different QA metrics
11:00:11 [Pedro]
Yves: can you specify it making the distintion between consumer and provider
11:01:29 [Pedro]
Felix and Phil: probably it is not necessary to separate consumer and producer
11:03:06 [fsasaki]
action: Dave to conclude quality discussion with Arle, including examples from existing implementation in CMS-LION - due to mid July
11:03:38 [Pedro]
FePhil: there are some metrics that already people capture and use
11:04:03 [Pedro]
Felix: there are many tools that can use this
11:04:29 [Pedro]
End of session
11:04:53 [fsasaki]
Yves will join the discussion, providing input about what current tools do and how that relates to the current propsals
11:04:57 [moran]
moran has joined #mlwdub
11:05:05 [philr]
scribe: philr
11:05:38 [philr]
TOPIC: Terminology metadata
11:06:06 [philr]
Tadej Stajner presenting
11:06:25 [RRSAgent]
I have made the request to generate fsasaki
11:06:44 [philr]
Goal annotate fragments of text
11:07:09 [philr]
Audiences, content authors, localizers, CMS, MT providers
11:08:39 [philr]
Data categories, Term; Named entity; Disambiguation; Text analysis annotation
11:10:16 [philr]
Annotations provide disambiguation through reference to semantic networks and aontologies
11:10:35 [philr]
11:11:04 [fsasaki]
11:11:48 [philr]
Tadej: use ITS to support HTML5
11:12:20 [philr]
...aid term matching in TM and CAT tools
11:13:20 [philr]
Tadej: are there more challenges?
11:13:53 [nico]
nico has joined #mlwDub
11:14:03 [philr]
felix: will TA approach lead to real-time tagging?
11:14:49 [philr]
tadej: there is a potential for semi-automatic tagging
11:15:41 [horst_kraemer]
horst_kraemer has joined #mlwdub
11:15:42 [philr]
What's being tagged are candidates, not terms themselves until they are human validated
11:16:48 [philr]
Ioannis: not realistic to have full automatic solution
11:16:59 [philr]
...need to find ways for semi-automatic
11:17:32 [philr]
tadej: identification or construction?
11:17:52 [philr] hard problem
11:18:44 [philr]
r12a: ITS 1.0 was concerned with term definition, is this different?
11:19:47 [philr]
fsasaki: one difference is disambiguation
11:21:51 [philr]
daveL: ITS 1.0 is a reference
11:22:48 [philr]
tadej: what are dereferencing scenarios?
11:23:17 [philr]
11:24:07 [philr]
...would like to see some kind of retreival mechanism
11:25:18 [philr]
pedro: candidate proposal interesting
11:26:02 [philr]
...content authors will need tool assistance to mark terms
11:26:45 [philr]
...glossaries are very guarded by companies
11:27:08 [philr]
...need to link/map to proprietary glossaries
11:27:49 [philr]
fsasaki: can this be used by MT providers?
11:28:19 [philr]
MT providers have own methods to disambiguate
11:28:55 [philr]
...problem of extending lexicons online
11:29:27 [philr]
...client terminology supplied with translation task
11:30:16 [philr]
dgroves: open question of how SMT use
11:30:29 [r12a]
i'm wondering whether term definition and cross-language term equivalence should be in the same data category
11:30:39 [philr]
...difficult on-the-fly consumption
11:31:34 [philr]
tadej: does this information help?
11:32:02 [philr]
dgroves: unanswered question
11:32:59 [philr]
dF: on-the-fly more useful for rules engines
11:34:11 [philr]
tattiana: promising initiative.
11:34:24 [philr]
...could be a foundation
11:35:39 [philr]
tattiana: 30% increase in MT quality from terminology related work
11:36:33 [philr]
Are proprietary glossaries not linked to public repositories?
11:37:53 [philr]
Ioannis: term candidates go through a rigorous process of approval - can take months
11:38:03 [philr] enterprises
11:38:36 [dgroves]
To note: the use of terminology in MT research is generally considered a type of domain adaptation
11:39:07 [philr]
...customers need help with terminology
11:40:03 [philr]
Term lifecycle phase attribute?
11:40:46 [philr]
11:41:52 [philr]
tatiana: important to distinguish between aquisition and recognition
11:44:07 [philr]
tadej: annotationAgent special case of provenance
11:44:43 [philr]
...example markup being presented
11:45:01 [RRSAgent]
I have made the request to generate fsasaki
11:45:50 [philr]
Ioannis: we have linking lexicon links
11:46:47 [philr]
Necessary to have more than one term bases; product specific; cascading
11:47:02 [philr]
...client specific
11:49:54 [philr]
tadej: stand-off markup cleaner
11:50:27 [philr] favour of inline by default but need portability
11:50:53 [philr]
session closed
11:52:23 [r12a]
rrsagent, draft minutes
11:52:23 [RRSAgent]
I have made the request to generate r12a
11:53:15 [RRSAgent]
I have made the request to generate fsasaki
12:56:03 [ChrisLyons]
ChrisLyons has joined #mlwDub
12:58:21 [Milan]
Milan has joined #mlwdub
12:59:13 [mhellwig]
mhellwig has joined #mlwdub
13:00:08 [Milan]
rrsagent, where am I?
13:00:08 [RRSAgent]
13:01:36 [leroy]
leroy has joined #mlwDub
13:05:26 [Milan]
scribe: Milan
13:05:43 [Milan]
Topic: Updating ITS 1.0
13:06:42 [dF]
dF has joined #mlwDub
13:07:28 [Milan]
Tadej: Managing a lifecycle of terms
13:09:45 [Milan]
.. Confidence of the annotation (named entity)
13:10:13 [Milan]
.. difficult for some approaches
13:12:43 [mlefranc]
mlefranc has joined #mlwdub
13:13:14 [Milan]
.. Disambiguation for distinct words
13:17:08 [dgroves]
dgroves has joined #mlwdub
13:17:23 [Milan]
action: Tadej to create a summary of implementation status of Terminology Metadata Generation
13:18:30 [Jirka]
My notes from representation session are at
13:19:17 [Milan]
Felix: MLW-LT must support all ITS 1.0 and their functionality
13:20:30 [Milan]
Yves: How to distinguish 1.0 in 2.0?
13:20:48 [Milan]
Felix: We will have references (e.g. to Ruby)
13:21:49 [Milan]
Jirka: Prefer to keep Ruby in 2.0
13:24:17 [nico]
nico has joined #mlwDub
13:25:36 [Zakim]
Zakim has left #mlwDub
13:27:48 [Milan]
action: Jirka to summarize usage of Ruby in DocBook
13:28:48 [Milan]
rrsagent, draft minutes
13:28:48 [RRSAgent]
I have made the request to generate Milan
13:30:57 [fsasaki]
fsasaki has joined #mlwdub
13:31:08 [fsasaki]
scribe: fsasaki
13:31:21 [fsasaki]
topic: presentation from alex lik
13:32:14 [fsasaki]
alex: localized publications - instructions for use, release notes, systems messages in a GUI, ...
13:32:38 [fsasaki]
.. full localization of GUI
13:33:18 [fsasaki]
.. tagset for XML - textcontainer for software is XML. If the ITS tagset works for any XML, software localization can benefit from that too
13:33:35 [RRSAgent]
I have made the request to generate fsasaki
13:34:15 [fsasaki]
alex: medical device manufactures have a highly regulated environment
13:34:43 [fsasaki]
.. local regulations, CFR (QSR), ISO 13485, IEC 60601, directive 93/42/EEC
13:35:08 [fsasaki]
alex: challenges:
13:35:20 [fsasaki]
.. variety of authoring platforms, even within one company
13:35:33 [fsasaki]
.. end user materials is in DITA XML
13:35:56 [fsasaki]
.. having a tagset for "any XML" is a good condition
13:36:03 [fsasaki]
.. but we want to go further to have real single sourcing
13:36:24 [fsasaki]
.. modification of underlying format (and in the tagset) leads to changes in the fragmentation
13:36:40 [fsasaki]
.. that leads to changes when we analyse the material for localization price quotes
13:37:02 [fsasaki]
.. there is material that has been translated in XML; when you send it to HTML the price quote is comparable to the original one
13:37:34 [fsasaki]
.. there are materials that are in word documents that are not legal templates
13:38:27 [fsasaki]
.. there are many companies that have mandatory end user material
13:38:31 [fsasaki]
.. and other types of material
13:38:43 [fsasaki]
.. on one hand the separation is logical, but it can also impose problems
13:38:56 [fsasaki]
.. the localization costs can growth tremendously
13:39:38 [fsasaki]
.. having all content in the same repository & container will be helpful
13:40:00 [fsasaki]
.. question of information architecture - do we need to train developers to work with ITS?
13:40:27 [fsasaki]
moritz: depends a bit - one issue we are seeing.
13:40:36 [fsasaki]
.. the end user might not have a technical background
13:40:58 [fsasaki]
moritz: we are looking into finding ways for having interfaces for users
13:41:18 [fsasaki]
alex: I am talking about content managers & information architects
13:42:41 [fsasaki]
felix: these people need to understand XPath at least to be able to write some useful rules
13:43:15 [fsasaki]
olaf: documentum would produce a lot of pages, but multilingualism is not taken into account
13:43:34 [fsasaki]
.. there is a huge re-training of the information architecture people necessary to understand internationalization
13:43:56 [fsasaki]
alex: thanks a lot for that comment
13:44:20 [fsasaki]
.. back to the challenges - terminology mgmt was mentioned before
13:45:53 [fsasaki]
.. we can have our content in DITA, output in HTML
13:46:42 [fsasaki]
.. re-publishing is important.
13:47:02 [fsasaki]
pedro: some implementations we are doing give you some background:
13:47:16 [fsasaki]
.. you cannot put in the CMS all complexity of the localization progress
13:47:23 [fsasaki]
.. you have to connect through e.g. a gateway
13:47:32 [fsasaki]
.. to a platform that can do your requirements
13:47:46 [Marion_Shaw]
Marion_Shaw has joined #mlwDub
13:47:58 [fsasaki]
expectations: compatibility with DITA XML
13:48:04 [fsasaki]
.. integrating in SW resource files
13:48:08 [fsasaki]
.. interop with XLIF
13:48:12 [fsasaki]
.. terminology mgmt
13:48:22 [fsasaki]
.. removal and re-integration of ITS markup
13:48:31 [fsasaki]
.. ease of implementatio for tool vendors
13:48:35 [fsasaki]
.. visible ease on the bill
13:48:52 [fsasaki]
alex: my main point is about the XML deep theb
13:49:42 [fsasaki]
.. not so much about CMS
13:51:04 [fsasaki]
.. take-away for me: educational aspects are missing for authors and others
13:51:12 [fsasaki]
topic: presentation from des oates
13:51:17 [RRSAgent]
I have made the request to generate fsasaki
13:51:48 [fsasaki]
des: focusing on the processes that we adopt within Adobe
13:53:16 [fsasaki]
.. I am interested in this group because it covers three domains that are interesting for me:
13:53:24 [fsasaki]
.. content creation, publication, localization
13:53:59 [fsasaki]
.. publishing happens in different ways: raw HTML using CQ5 CMS
13:54:02 [fsasaki]
.. we publish software
13:54:22 [fsasaki]
.. and also documentation
13:54:33 [omstefanov]
omstefanov has joined #mlwdub
13:55:09 [fsasaki]
.. localizable content in source control systems,
13:55:28 [fsasaki]
.. a lot of content in multiple repository formats
13:55:40 [fsasaki]
.. framemaker, CQ5 again, web CMS
13:56:19 [fsasaki]
.. translation services used internally: TMS, another TMS, ...
13:56:41 [fsasaki]
.. adobe translator
13:57:21 [fsasaki]
.. many inputs, many outputs
13:57:24 [fsasaki]
.. a lot of complexity!
13:58:18 [fsasaki]
.. 18 months ago we created an internal mediation layer, connecting authoring / publication / translation together
13:58:25 [fsasaki]
.. you still have the three domains
13:58:44 [fsasaki]
.. they are connected to mediation layer
13:59:06 [fsasaki]
.. that supplies filtering / normalization, leverage, terminology / QA check, MT service
13:59:34 [Jirka]
Jirka has joined #mlwDub
13:59:40 [fsasaki]
.. MT is an abstraction layer that allows to plug in various components:
13:59:46 [fsasaki]
.. moses, external MT providers, ...
13:59:54 [fsasaki]
.. we access them all through a set of APIs
14:00:34 [fsasaki]
.. each of the services is a potential consumer or provider of the metadata that we are discussing
14:00:41 [fsasaki]
.. they are decoupled, but they work together in workflows
14:01:00 [RRSAgent]
I have made the request to generate fsasaki
14:01:09 [fsasaki]
des: we have many translation processes
14:01:22 [fsasaki]
.. we have to match different business requirements
14:01:40 [fsasaki]
.. we don't want to create customizations all the time, customizing connectors etc.
14:02:08 [fsasaki]
.. what is the purpose of the mediation
14:02:38 [fsasaki]
.. example: MT workflow: from CMS > XML, normalization process, XLIFF transformation, leverage of XLIFF
14:02:46 [fsasaki]
.. (check if everything is re-usable)
14:03:03 [fsasaki]
.. if it is not re-usable it will not be propagated through the workflow
14:03:38 [fsasaki]
.. after machine translation, content goes to post editors
14:03:52 [fsasaki]
.. after that content goes to XML and CMS (HTML)
14:04:04 [fsasaki]
.. that's a typical workflow that we deploy with our platform
14:04:11 [fsasaki]
.. where would the metadata be important?
14:04:31 [fsasaki]
.. in MT service: translate, in TMS: loc note and disambiguation
14:04:39 [fsasaki]
.. above are just examples
14:05:00 [fsasaki]
.. another example workflow: user generated content, also with XLIFF and MT
14:05:05 [RRSAgent]
I have made the request to generate fsasaki
14:05:35 [fsasaki]
.. above is a real time process: user says "I want to have page in a different language", clicks, and gets the content
14:05:54 [fsasaki]
.. important here: translate, disambiguation, provenance metadata
14:06:15 [fsasaki]
.. we need the metadata for our SOA based localization
14:06:34 [fsasaki]
.. without a standard form of metadata we will loose data
14:06:44 [fsasaki]
.. provenance is important
14:06:50 [fsasaki]
.. ITS 2.0 should solve parts of these problems
14:07:11 [RRSAgent]
I have made the request to generate fsasaki
14:07:17 [fsasaki]
des: beyond the data categories
14:07:52 [fsasaki]
.. I have an additional set of requirements, in addition to markup / attributes etc.
14:08:18 [fsasaki]
.. it should be straightforward to establish which subset of ITS 2.0 an implementation supports
14:08:39 [fsasaki]
.. in SOA, we need to know what metadata a system supports
14:09:03 [fsasaki]
.. that's orthogonal to the data modeling confirming
14:09:36 [fsasaki]
.. unknown domains and organizations private use of data categories should be considered
14:10:08 [fsasaki]
des: beyond / in concert with ITS2.0
14:10:26 [fsasaki]
.. ITS 2.0 will solve parts of the problem, other components will need to be addressed too
14:10:39 [fsasaki]
.. standardization of content packaging, see e.g. linport project
14:10:50 [gderiard]
gderiard has joined #mlwDub
14:11:12 [fsasaki]
des: standardization of service boundaries
14:11:33 [Arle]
Here service boundaries = standard APIs for the various services.
14:12:13 [fsasaki]
des: clear opportunity of standardization of APIs
14:12:28 [fsasaki]
.. could help integrating terminologoy systems with workflows
14:13:42 [fsasaki]
dave: really enjoyed your diagrams, want to re-use them for our use-case document
14:13:48 [fsasaki]
.. also agree about service boundaries
14:14:01 [fsasaki]
.. not sure if there is a deliverable of the working group
14:14:33 [fsasaki]
.. but having examples / slides like that to communicate the problem is very helpful
14:15:09 [fsasaki]
des: yes, understand that at the moment it is out of scope, but we need to assure that it is taken up at some point
14:15:31 [fsasaki]
dave: I also agree that for a service you need a clean way to state what data categories you support
14:16:19 [fsasaki]
des: it is part of the business agreement, needs to be very clear
14:17:23 [Arle]
Felix: The conformance statements we have for 1.0, for 2.0 we need to make it clearer that someone implementing must have these statements. We should be clearer about what must be provided: e.g., machine-readable, human readable, various implementations
14:18:16 [fsasaki]
dave: some things are important for the in-formative part of the spec
14:18:31 [fsasaki]
.. e.g. in very clear best practices documents, see richard's example
14:19:23 [fsasaki]
xyz: in the "example workflow", is there a way to transform various formats into HTML?
14:19:47 [fsasaki]
des: xliff is the interchange format that we use across the platforms
14:20:10 [fsasaki]
.. normally we would use XLIFF to translate data from one service to another service
14:20:26 [r12a]
r12a has left #mlwdub
14:20:30 [RRSAgent]
I have made the request to generate fsasaki
14:20:50 [r12a]
r12a has joined #mlwdub
14:20:50 [fsasaki]
14:20:59 [fsasaki]
davidF: XML is a transition format
14:21:14 [fsasaki]
.. if you start with HTML you will have HTML, same for other formats
14:21:27 [RRSAgent]
I have made the request to generate fsasaki
14:21:39 [fsasaki]
daveF: XSLT is the transformation language
14:21:52 [fsasaki]
kerstin: would it make sense to convert the lexicons into HTML already?
14:22:06 [fsasaki]
des: don't see the use case for HTML, since that's a publication format
14:23:15 [fsasaki]
.. if you have word or pagemaker etc., you just need a filter to convert things to XLIFF or another interchange format
14:23:20 [fsasaki]
.. that's the rationale for the conversion
14:24:19 [fsasaki]
moritz: one issue - what is if we get metadata that we don't support?
14:24:46 [fsasaki]
des: it is important to know what you expect
14:25:27 [Arle]
This is an interesting issue: when can you strip metadata? What happens when you get back metadata invalid for your domain>
14:25:29 [fsasaki]
moritz: should metadata be stripped out?
14:25:37 [Arle]
14:25:38 [fsasaki]
davidF: it is important to have defaults
14:25:56 [fsasaki]
.. there are ways around it even if you don't support everything
14:26:31 [Arle]
Felix: One last point. I will take an action point to come up with examples of implementations and what they can do. See if they fulfill your requirements.
14:27:06 [fsasaki]
action: felix to come up with example of SOA related presentation of metadata capabilities for des' requirement
14:27:50 [fsasaki]
moritz: an ontology of process states - can we agree on that?
14:27:58 [fsasaki]
dave: we cannot standardize the process
14:28:09 [fsasaki]
.. we can just try to "normalize the language"
14:28:22 [fsasaki]
.. comes down to people like des and dag who have a whole view on the process
14:28:45 [RRSAgent]
I have made the request to generate fsasaki
14:29:08 [Arle]
Agreed with Dave. The moment you claim to standardize the process, it creates problems. What you can do, however, is standardize the boundaries (per Des). Treat the process as a black box in your definitions, but one with well defined inputs and outputs. E.g. don't tell localization HOW to do things, but you can saw what it must make at the end of the day.
14:29:56 [fsasaki]
action: moritz, dave and others to look into process areas
14:30:14 [fsasaki]
davidF: interest is cooperation
14:31:47 [fsasaki]
felix: need to make sure that we resolve this is an timely manner
14:32:03 [fsasaki]
dave: this is also a part of our public relations work, not so much the data categories
14:32:12 [fsasaki]
.. people in the working group are familiar with the terminology
14:34:08 [Arle]
Felix: We need to have things resolved ASAP, but July.
14:34:28 [fsasaki]
davidF: will work on that
15:01:25 [moran]
moran has joined #mlwdub
15:02:59 [dF]
dF has joined #mlwDub
15:03:19 [dF_]
dF_ has joined #mlwdub
15:03:31 [gderiard]
gderiard has joined #mlwDub
15:08:23 [tadej]
scribe: tadej
15:08:27 [tadej]
15:08:45 [tadej]
topic: Bryan Schnabel: XLIFF Extensibility and Metadata
15:08:55 [RRSAgent]
I have made the request to generate fsasaki
15:10:53 [RRSAgent]
I have made the request to generate fsasaki
15:12:07 [tadej]
Bryan: There are three main ways to extend XLIFF1.2: elements, attributes, attribute values
15:12:14 [RRSAgent]
I have made the request to generate fsasaki
15:13:06 [tadej]
... with elements, you can use the usual XML namespace declaration mechanism
15:13:45 [tadej]
... similarly with attributes, where it is allowed.
15:14:25 [mhellwig]
mhellwig has joined #mlwdub
15:14:26 [tadej]
... with attribute values, you can prepend x- to your value where none of the existing options work for your use case.
15:16:09 [tadej]
... There is an DITA OpenToolit XLIFF/DITA roundtripping tool you can use
15:17:30 [micha]
micha has joined #mlwdub
15:18:30 [tadej]
... It's implemented as an Ant tool, and the tools also keep the original PDF to preserve context of the content
15:19:19 [tadej]
... The roundtripping XSLT required two distinct operational modes: the skeleton mode and the body mode.
15:19:53 [tadej]
... A custom namespace preserves the formatting information, so it is fully reconstructible.
15:21:13 [tadej]
... Here, I'm demonstrating a Drupal module for this.
15:22:32 [tadej]
... All Drupal information gets stored in a custom namespace. Some of the metadata also shows up in the XLIFF namespace. After the XLIFF file is translated, the plugin can import it and everything works.
15:23:37 [RRSAgent]
I have made the request to generate fsasaki
15:24:08 [tadej]
... In XLIFF 2.0, the extensibility was more restricted. Previously, people re-implemented XLIFF functionality in their extension, which we do not want.
15:24:59 [tadej]
... We then allowed custom namespaces for elements that are not already in XLIFF 2.0.
15:26:25 [tadej]
... After discussion, the WG vote tied on allowing elements-only and elements-and-custom-namespaces.
15:27:16 [Arle]
Just as a reminder to folks, I am posting all slides here:
15:27:42 [tadej]
Bryan: The tie still remains, the TC will decide shortly - you all can also get involved and influence the discussion.
15:30:28 [mlefranc]
mlefranc has joined #mlwdub
15:31:04 [tadej]
r12a: What about background compatibility? Will XLIFF1.2 users be stranded?
15:31:26 [tadej]
dF: XLIFF is a perishable transport format, it is usually not persisted.
15:31:58 [tadej]
fsasaki: If during the generation of the metadata some extensibility is used in a particular use case, the whole extensibility layer needs to be rewritten for 2.0
15:32:39 [tadej]
Yves_: For more complex metadata, we want to structure elements in the document and carry that information forward.
15:33:03 [tadej]
DagS: How do you map ITS to XLIFF? i.e. how to map a <term> tag to XLIFF?:
15:33:47 [tadej]
Yves_: For <term>, there is a specific element that is compatible. As long as we don't do complex terminology stuff, it's fine. If we do complex things with other data categories, we can use namespaces to extend.
15:34:43 [tadej]
Yves_: In XLIFF1.2, you could use ITS as a namespace extension.
15:35:50 [tadej]
daveL: The extensibility seems to have implications on validation, making it more complex. For instance, implements should be able to say which parts should be supported in their implementations.
15:36:21 [tadej]
... Do you have a criteria on what kind of extension are "acceptable" to you?
15:36:54 [tadej]
Yves_: Even internally in XLIFF, many components are modular and live in module namespaces which extend the core.
15:37:28 [tadej]
Yves_: That strategy could be good for supporting ITS tags within XLIFF.
15:38:39 [tadej]
Bryan: If the current extensiblity strategy doesn't work for you, you can raise your voice.
15:39:09 [tadej]
DagS: If it will go in this direction of restricting extensiblity, that is worrisome for the ITS ecosystem.
15:40:52 [tadej]
Des: We spent a long time shaping the requirements in the content domain, and we need to spend some time with supporting ITS in the localization domain. XLIFF is the workhorse there, and ITS needs to be integrated with XLIFF 2.0
15:41:30 [tadej]
DagS: Is restricting extensiblitly really going to help with the new implementations?
15:42:08 [tadej]
Yves_: One unsolved issue in 1.2 was segmentation representation. However, there is a mapping from 1.2 to 2.0.
15:42:28 [RRSAgent]
I have made the request to generate fsasaki
15:43:47 [tadej]
Bryan: The people against extensiblity are just listening to the community, which voiced legitimate concerns about extensiblity. The opposition's opinion is that the problem was not extensiblity, but misuse of it. We can solve this by enforcing conformance clauses and setting expectations.
15:44:00 [tadej]
DagS: Can there be a special case for ITS?
15:44:20 [tadej]
Bryan: Possibly - see mailing list for XLIFF.
15:45:19 [tadej]
r12a: They should be closely tied. You don't need to use its: for using our data categories, but you can also integrate the data categories into the XLIFF markup and provide conformance support.
15:45:34 [tadej]
Bryan: That's not being planned, but people would listen to these suggestions.
15:46:42 [tadej]
dF: There will be custom namespaces. Even if XLIFF by default won't have the extensiblity, ITS can still be suggested as an OASIS namespace and module.
15:46:58 [tadej]
... There is a proposed feature in XLIFF2.0 for ITS support.
15:48:12 [tadej]
r12a: I would really like to move this even closer: I realize it's still an optional extensional. Ideally, support would be integrated in the core. This doesn't affect inline markup?
15:49:25 [tadej]
fsasaki: There are other communities that are affected by this policy.
15:49:31 [tadej]
Yves_: Yes, like TBX.
15:50:15 [tadej]
daveL: Do we have any consensus here? How many people would like to say "Please keep namespaces in 2.0"?
15:51:02 [tadej]
DagS: We want to have XLIFF support ITS. The exact implementation is not that important (namespaces, modules, etc.)
15:51:41 [tadej]
Bryan: If you feel passionate about that, comment yourself.
15:51:44 [fsasaki]
action: felix to write a mail to XLIFF tc, check that on Wednesday morning again
15:52:43 [tadej]
topic: Localization Requirements
15:53:14 [tadej]
Yves_: We introduce the data category idValue as a selector of content.
15:53:49 [tadej]
... Some discussion was about what do we need to identify? Segments? How do we select the ids?
15:54:57 [tadej]
mlefranc: This could be relevant to the provenance discussion tomorrow.
15:55:21 [tadej]
daveL: We need to be able to point to a portion of text when we talk about provenance.
15:56:06 [tadej]
... There are several ways: introducing new <span> tags, re-using existing markup, or doing completely stand-off annotation.
15:56:32 [tadej]
Yves_: I am referring to existing IDs in the document and should be persistent throught the process.
15:56:51 [tadej]
... Data categories are discrete, the more orthogonal the better.
15:57:26 [tadej]
daveL: How important is this requirement for mapping translation units?
15:57:47 [tadej]
Yves_: It's important for localization, but XLIFF has its own id space.
15:58:40 [kurzum]
kurzum has joined #mlwdub
15:59:43 [tadej]
fsasaki: Let's continue this discussion tomorrow, there's also the targetPointer debate.
16:00:18 [daveL]
scribe: daveL
16:00:57 [daveL]
TOPIC: BCP 47 Developments
16:01:22 [daveL]
Mark Davis presents remotely
16:02:41 [daveL]
...introduces unicode locale/lang ID which is based on BCP47
16:03:15 [daveL]
...there are extensions to these code
16:04:20 [daveL]
...extension U relate to locales with various calendars, phone number formats, digit sets etc
16:04:47 [daveL]
... e.g. arabic with arabic numbers or western numbers
16:05:33 [daveL]
... t extensions, indicate transform of content, see rfc6497
16:05:55 [daveL]
... e.g. transliterattion, translation, transscription
16:06:52 [daveL]
... intended for interchange circumstances where there is no structured way of expressing the transform
16:08:24 [daveL]
... options can indicate transform mechanism, input method, ketboard method and a specific on for machine translation - plus ones for private use
16:09:28 [daveL]
... resources avalable for choosing language tags and extension fields and sub/fields
16:11:53 [daveL]
Yves asks confirmation that ID is both an inication to future action and can be used to report past actions
16:12:17 [daveL]
Mark suggests using differing tag codes for these different uses
16:13:12 [daveL]
... but confirms that the coding scheme could be used for either
16:14:01 [daveL]
Dag asks confirmation that this is not intended to replace locale tag
16:15:53 [daveL]
Mark states this could be used to tag content as being in one language but additionally that it was machine translated from english
16:16:45 [daveL]
Dag concerns that if this replaced locale tag could confuse implementer, overriding exisitng functions that understand straigh BCP language tags
16:18:14 [daveL]
Mark responds that the tag could be ignored if the application is not understood
16:19:06 [daveL]
David Filip reiterated concern that overloading the tag for both instructions and reporting would be dangerous
16:20:06 [daveL]
Mark responds that its up to the author of the content to decide how to use the tag
16:20:43 [daveL]
David expanded to ask if the usage should be context driven
16:22:09 [daveL]
Mark responds that it can be used for request and response, and response doesn't have to conform to request
16:22:27 [daveL]
... but does expect to use tthis in a richer environment, e.g. and XLIFF document
16:23:29 [daveL]
r12a state that the attribute should dictate whether this is infooration or an instruction
16:24:14 [daveL]
Jirka asks if there had been consideration of using such tags in http headers
16:24:51 [daveL]
Mark states that the intention is this should be used in a very wide range of circumstances and languages
16:25:27 [daveL]
... purpose for putting it in BCP47 is to support this wide variety of contexts
16:27:02 [daveL]
Felix asks if process tracing data would use this tag, or would it be in more structured markup
16:28:26 [daveL]
Mark responds that this could be used in other standards, and perhps the constituent codes could be used
16:29:09 [daveL]
Felix asks for some guideance on more usage on this extension, he will follow up.
16:29:35 [daveL]
rrsagent, generate minutes
16:29:35 [RRSAgent]
I have made the request to generate daveL
16:29:52 [daveL]
TOPIC: wrap-up for today
16:29:54 [fsasaki]
action: felix to follow up on bcp 47 "t" guidance in i18n core working group
16:30:28 [daveL]
Arle encourages peopple not already in the group to get more involved in WG
16:30:59 [r12a]
i also wanted to make the point that lang= or xml:lang= in a web page ONLY means that the content within the associated element is in a particular language - if you want to indicate some other thing, such as the language of an external resource or a request for information in a particular language (see
16:31:11 [daveL]
Arle recognises sponsorship of CNGL in this event
16:31:17 [r12a]
this is something that often trips up working groups
16:31:31 [daveL]
... and highlights next MLW workshop in March 2013 in Rome
16:31:39 [fsasaki]
link to locworld event at
16:31:48 [RRSAgent]
I have made the request to generate fsasaki
16:32:02 [dF]
dF has joined #mlwdub
16:32:48 [daveL]
... and there is a workshop planned on cross institute interoperability called FEISGILT colocated with LocWorld in Seattle in oct 2012
16:33:18 [dF]
FEISGILTT 2012 call for papers:
16:33:32 [RRSAgent]
I have made the request to generate fsasaki
16:33:35 [daveL]
...thanks you and see you tonight in turks head and tomorrow on second day
16:33:43 [daveL]
rrsagent, generate minutes
16:33:43 [RRSAgent]
I have made the request to generate daveL