07:40:04 RRSAgent has joined #mlwDub 07:40:04 logging to http://www.w3.org/2012/06/12-mlwDub-irc 07:44:51 Meeting: MultilingualWeb-LT Workshop 07:46:46 Agenda: http://www.multilingualweb.eu/en/documents/dublin-workshop/dublin-program 07:48:31 Scribe: Yves 07:48:35 fsasaki has joined #mlwdub 07:48:38 ScribeNick: Yves_ 07:48:55 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 07:49:17 daveL has joined #mlwDub 07:49:36 present: many, manyy, manyyy, people 07:49:39 chair: Arle 07:49:45 scribe: Yves_ 07:49:57 topic: intro 07:50:00 meeting not started yet 07:50:04 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 07:52:21 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 07:53:13 fsasaki has joined #mlwdub 07:53:28 q+ 07:53:31 q- 07:53:43 philr has joined #mlwdub 07:55:31 Milan has joined #mlwDub 08:04:51 leroy has joined #mlwDub 08:05:05 Milan has joined #mlwdub 08:05:28 Arle has joined #mlwdub 08:07:29 felix: today we'll work on the ITS2.0 requirements 08:07:40 .. first let's introduce ourselves 08:08:09 tadej has joined #mlwDub 08:08:19 mhellwig has joined #mlwdub 08:08:33 Pedro has joined #mlwDub 08:08:34 moran has joined #mlwdub 08:08:45 Rob has joined #mlwdub 08:08:49 Sebastian has joined #mlwdub 08:08:56 Bryans has joined #mlwdub 08:08:59 omstefanov has joined #mlwdub 08:09:05 XavierMaza_GALA has joined #mlwdub 08:09:06 nico has joined #mlwDub 08:09:56 Phil Ritchie, CTO at VistaTEC. Industrial Partners in CNGL and MLW-LT. Within MLW-LT interested in the encapsulation of linguistic quality information and provenance within metadata. 08:10:33 ChrisLyons has joined #mlwDub 08:10:55 peter has joined #mlwdub 08:11:32 jakob has joined #mlwdub 08:12:05 Des has joined #mlwdub 08:15:37 thomas has joined #mlwDub 08:16:42 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html Yves_ 08:17:17 Sebastian Sklarss, Interoperability and Open Data Consultant at medium-sized privately owned company ]init[ (www.init.eu). My collegue Horst Kraemer will join later. Interested in implementing ITS in our customers' CMS 08:17:24 Xavier Maza, Language Services Manager at iDISC and GALA (Globalization and Localization Association) board member, interested in hearing from you to take back to our membership. 08:17:54 Felix: ITS 2.0 was started at the beginning of this year 08:18:34 ... some people may not know ITS 1.0 very well, so I'll try to summarize it 08:19:02 .. ITS defines "data categories" 08:19:44 .. they are separated item (not necessarily related), allowing flexibility 08:20:00 .. we provide non-application-specific definitions 08:20:18 .. it's ok to implement only some data categories, not all 08:20:53 .. for example the Translate data category 08:21:17 .. you can express it locally (its:translate on an element) 08:21:45 .. HTML5 also implement that data category: the 'translate' attribute 08:22:11 .. it's easy to map the implementations 08:22:23 Tony has joined #mlwdub 08:22:30 .. In addition ITS offers the "global" approach 08:23:11 .. ITS 1.0 offers global rules using XPath selectors that select to which nodes the data applies 08:23:42 dgroves has joined #mlwdub 08:24:05 r12a has joined #mlwdub 08:24:17 .. you can compare this to CSS: defaults, rules in files, rules in the document itself, and locally as well. 08:24:42 .. In ITS 2.0 we want to apply ITS in HTML5, CMS content, etc. 08:25:08 .. we want also to have some bridges to the semantic web 08:25:31 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html Yves_ 08:26:53 Felix shows Richard's test for the HTML5 translate attribute using different systems. 08:28:34 http://www.w3.org/International/tests/html-css/translate/results-online 08:28:47 Felix: ITS 1.0 has 7 data categories, focused on XML 08:29:08 note that these test results need updating - last week i found out that MS now produces positive results for all tests 08:29:36 dF has joined #mlwDub 08:29:50 ie. .......... .... now works 08:33:52 Yves_ has joined #mlwdub 08:34:00 ITS 2.0 Disambiguation would allow linking to thesauri, etc. for MT. 08:34:07 scribeNick: yves_ 08:34:21 s/ITS 2.0/Felix: ITS 2.0/ 08:35:08 Felix: other data categories: text analysis annotation 08:36:37 David Filip: How is ITS 1.0 term different from disambiguation? 08:36:56 Yves__ has joined #mlwdub 08:37:35 Felix: term is not application specific, is a general item. disambiguation data is specific to this and ties to resources specifically for the purpose of disambiguation. But we need to discuss these details to finalize our work. 08:37:54 .. We need disambiguation in other areas, so this is designed for that purpose. 08:38:29 Richard: There were some other categories in ITS 1.0 you didn't show in the ITS 2.0 slide. Will they be dropped? 08:39:40 Felix: It was just that nobody showed interest in working on them here. But because the data categories are independent, we don't have to deal with them. They may be handled elsewhere. But in any event, we keep ITS 1.0 categories. We may point to them somewhere else or develop them further. 08:40:24 Yves_ has joined #mlwDub 08:40:24 Richard: I'm worried that we might lose important things like directionality. It is useful for people using XML to have guidance. We don't want to drop them. 08:40:58 Felix: The list I showed is take from the ToC of ITS 1.0. ITS 2.0 will contain all of them and then add more. So all of them will remain in ITS. 08:41:12 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html Yves_ 08:41:28 .. We will give guidance for what to do, but the actual specification may point to work in another working group, but we don't drop it. 08:41:40 .. E.g., the HTML5 working group might define parts of these. 08:42:09 Dave Lewis: The lists you showed are snapshots from today. All it means is that there was discussion about some points and others. It's where we are today, but it can be changed. 08:43:19 Olaf-Michael Stefanov: Just because something form ITS 1.0 is not on the ITS 2.0 list does not mean it will be dropped. Just because our group doesn't implement does not mean we can't refer to other specifications for those points. 08:43:39 s/Olaf-Michael Stefanov/Felix/ 08:44:12 Felix: We need to get to concrete details to find consensus and implementation commitments. 08:44:32 .. We need to decide how to implement these categories in various formats. 08:44:52 .. We have consensus on a small set of ideas, but ideas for others, so we need to come to consensus. 08:45:44 .. Our time-frame is that we need the general framework by the end of July. That does not mean all details need to be sorted out, but we need to have the list nailed down, with a list of what is to "be in the basket". We need a draft by October and a stable draft by November. 08:46:03 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html Yves_ 08:47:12 .. The group is funded by the EU. We need to be as implementation-driven as possible. E.g., the translate data category really helps convey the message about what can be done and also shows issues. If you follow Richard's test, it shows issues with nesting of different translate states. That is not handled yet. By prototyping simple categories we can tell what is feasible. 08:47:55 .. For participants, please think about what you really want to work on before the summer break. Do it for at least HTML 5 and XML, using both local and global (XPath) markup. 08:48:09 .. Also, engage customers to see what they want to do. Use real-world testing. 08:48:49 .. It is a chaotic process. Start with playing with stuff to see what works. When I say play and prototyping, those outside the group might ask what sorts of implementations might be produced. 08:49:40 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html Yves_ 08:49:53 .. Got the MLW-LT homepage and see what deliverables are needed. It shows the areas where we expect to see implementations, e.g., Drupal for CMS by Cocomore, annotation by Tadej, in MT (Linguaserve, DCU), annotation of MT data, quality (Phil), etc. 08:50:19 .. At the end of the day we need stable implementations created using the EU funding. 08:50:29 .. We need more implementation experience. 08:51:04 Richard: Microsoft Translator does now support nested translate attributes properly. I just tested it and it worked. 08:51:06 http://www.w3.org/International/multilingualweb/lt/wiki/Deliverables 08:52:01 Felix: The purpose is to get your ideas and commitments. Look at the requirements document for the ones where we have consensus. We need to understand what is important for which community. We also want to see how you can make money by seeing the value for the cost of changing to use these. 08:52:18 .. Find where it makes the most sense/value. We need business case-level arguments. 08:53:07 .. The group is moving forward a lot. The chairs and participants provide some pointers for the discussion aligned with the sessions. Use the mail you got to help guide the discussion. 08:53:29 Richard: What's the process for stating that you like a category and deciding whether it is in or out? 08:53:46 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html Yves_ 08:54:10 Felix: Join the IRC and when we discuss the categories and implementation, mention your support and concerns there. After the meeting we will analyze the comments to see what we need to take into account, what people supported. 08:54:14 mlefranc has joined #mlwdub 08:54:54 Yves_ has joined #mlwDub 08:54:58 Topic: Work session on representation formats 08:56:17 Jirka: Maxime will talk about issues and then I will discuss other issues. 08:56:25 Topic: Maxime's presentation 08:56:33 iprause has joined #mlwDub 08:57:44 scribe: fsasaki 08:57:49 micha has joined #mlwdub 08:58:17 PaulMac has joined #mlwDub 08:58:29 Jirka has joined #mlwDub 08:58:43 kurzum has joined #mlwdub 08:58:55 maxime: RDFa representation format - drop as a requirement? 08:59:06 .. no, since RDFa mapping of data categories is in the working group charter 08:59:49 .. different conceptualization: RDFa is for statements embedded in HTML, ITS is about a specific piece of content 09:00:12 .. Sebastian Hellmann proposed the NIF format to have context based URIs 09:00:40 .. two approaches to generate URIs: hash based or XPointer based 09:01:04 i guess a link to the requirements doc would be useful: http://www.w3.org/TR/its2req 09:02:22 .. comparison of what can be selected with NIF, CSS seleectors, XPath 1.0 / 2.0, XPointer 09:02:24 iprause has joined #mlwDub 09:02:55 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 09:04:21 maxime: XPointer 1.0 has a small extension to XPath that is hard to implement 09:04:41 .. big issue in RDFa - how to deal with inheritance and overriding 09:05:03 .. probably be out of scope for us 09:05:13 .. CURIEs - use URIs with less verbosity 09:05:37 I have a concern about the statement about MUST support. It runs into compliance issues for us since you only need to implement one data category in one format to claim compliance. If you are interested only in implementing translate in HTML5, this proposal would seem to require you to support stuff you don't care about or actually need. 09:06:00 Arle, where is MUST written? In Maxime's presentation? 09:06:07 Yes. 09:06:48 Pedro has joined #mlwDub 09:07:02 maxime: consumers of ITS could use CURIEs to shorten URIs 09:07:18 .. inspiration from Provenance wg: they deal with XML and RDF at the same time 09:07:35 .. PROV data model, PROV ontology, 09:07:43 .. suggestion would be to have multiple facets 09:07:57 Slide 11: "ITS 2.0 implementations MUST implement XPointer“ 09:08:03 .. ITS data model, ITS-XML, ITS-O (Ontology with mapping ITS to RDF) 09:08:11 .. ITS-HTML, its-* attributes 09:08:15 .. ITS-HTML-RDFa 09:08:21 .. ITS-HTML-Microdata 09:09:12 maxime: provenance model relates agents and activities 09:09:25 .. e.g. "translator leads LT-activities on fragments of text" 09:09:48 .. suggestion to define prov:Organization, prov:Person, prov:SoftwareAgent, ... 09:10:21 .. as agents; activities are human translation, machine translation, quality assessment 09:10:37 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 09:12:28 maxime: issues of local ITS annotation: can't express complex set of ITS attribute, e.g. its-* elements 09:12:44 .. possible solutions: wirte directly ITS-XML in a script element or a head element 09:13:50 .. finally - about ITS namespace, should it be kept? 09:14:18 TCD's CMS LION implementation and its use of provenance documented in recent LREc paper: http://www.w3.org/International/multilingualweb/lt/wiki/images/b/b6/LREC-lewis.pdf 09:14:34 .. can be kept if you use ITS with content negotation, example with SKOS that is always redirected to latest version of a schema 09:14:58 .. for us, we would have a URI of ITS 2.0 specification 09:15:33 .. and content neg for various schemas 09:17:18 Yalemisew has joined #mlwdub 09:17:29 gderiard has joined #mlwdub 09:17:46 q+ The presentation will be available in the wiki? 09:18:42 q- 09:18:48 pedro, yes, it will be in the wiki 09:19:26 to [11:05] , I agree, that would only be needed for data categories that have CURIEs as Datatype 09:20:14 jirka is describing how representation is done so far for XML in ITS 1.0 and HTML5 09:21:08 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 09:21:28 jirka: after discussion with HTML working group, we decided to use its- attributes 09:21:45 q+ to mention importance of cooperation with other WGs wrt representation of ITS 09:22:42 maxime, thanks for clarifying. That wasn't necessarily clear from the slide, where it looked like a general requirement for ITS 2.0 conformance. So the proposal would need to be hedged a bit. 09:23:01 Preliminary slides page URL will be: http://www.w3.org/International/multilingualweb/lt/wiki/WS5_Preliminary_Presentations 09:23:16 thanks a lot, Arle 09:23:45 jirka showing how global rules can be linked from an HTML document 09:24:13 jirka showing microdata mapping from its- attributes 09:25:31 jirka: meta element cannot be used everywhere in HTML, that might be an issue for microdata mapping of ITS 09:25:59 .. RDFa is most problematic - jirka is propose to kill that proposal 09:26:36 .. maxime had some ideas to do that; now I am worried about this mapping 09:27:12 .. RDFa is another syntax how to express RDF triples 09:27:34 .. so the subject of the tripel is the whole page, not the original piece of content 09:28:49 .. we promised a mapping to RDFa, so I'd be happy if we have a proposal to work with NIF 09:29:19 .. maxime: agree that people from the SW area need to have ideas 09:30:06 action: maxime to lead discussion on RDF serialization in ITS, with "task force" people - Sebastian, Maxime, Dave, ... 09:30:39 q? 09:30:47 dave: is there a use case for RDFa expressing ITS? 09:31:11 tadej: i can provide the data, but if nobody is consuming it that's a problem 09:31:42 .. text analytics software provides info often via URIs 09:31:49 .. that can be expressed without RDFa 09:32:06 .. so we can provide RDFa easily, but does it make sense to use it 09:32:48 .. NIF serves the issues with RDFa 09:33:40 davidF: maxime prepared great stuff 09:33:46 .. but I would aks for a use case 09:34:51 davidF: maybe a clarification of the charter would help 09:37:11 action: Felix to work on charter clarification 09:38:02 maxime: we could use ITS in RDF to localize ontologies 09:38:45 dave: are we taking localization of ontologies as a requirement on board? 09:40:03 .. also, provenance that we are working on (RDF based) does not require RDFa 09:40:36 .. question is really if we have a use case for generating an RDF graph from the content 09:41:10 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 09:41:53 jirka: other issues - how to represent in former versions of HTML 09:41:58 .. e.g. HTML 4 or HTML 3.2 09:42:16 .. I think we don't need to provide that - even in HTML 3 or HTML 4.2 you can use the its- attribute 09:42:40 action: jirka to make a clarification in the req draft about previous versions of HTML 09:43:38 above is ISSUE-19 09:43:54 jirka: XPointer - is stil a working draft, will probably not be finished in time 09:44:00 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 09:44:08 ack 09:44:12 ack f 09:44:12 fsasaki, you wanted to mention importance of cooperation with other WGs wrt representation of ITS 09:44:46 maxime: we could use XPointer without the extension of range 09:44:55 .. we would have the benefits of URIs 09:45:05 jira: so in a selector a URI that can use XPointer fragment? 09:45:16 maxime: yes 09:45:35 jirka: that will be used in global rules? 09:45:49 maxime: yes 09:46:00 jirka: will be current inconvinient for current ITS usage 09:46:10 .. currently you say "all titles in all chapters" via XPath 09:46:29 .. with XPointer you will just be about a particular document identified by the URI 09:46:40 .. so there might be no real use case to switch to XPointer 09:46:47 .. so I'd propose not to use the idea now 09:47:16 davidF: if XPointer spec will not be finished, we cannot do this 09:49:56 jirka: maxime was proposing to publish schemas with content negotiation 09:50:25 .. these techniques are controversial also in SW - need to provide things that run more automatic over HTTP 09:51:07 davidF: like the idea of content negotation 09:51:51 maxime: not for DTD or XSD, but other areas it might be relevant 09:52:17 felix: at dF, the content negotiation is about the schemas 09:52:52 .. so probably a different case than df has in mind 09:53:00 sebastian: you have an ontology in a sense 09:53:21 jirka: I mean ontology in terms OWL 09:53:29 sebastian: can be easily created 09:53:55 felix: is there a use case for the ontology? 09:53:59 sebastian: not sure 09:54:06 provenance issue will be postponed 10:15:09 Pedro has joined #mlwDub 10:18:45 Rob has joined #mlwdub 10:20:04 dF has joined #mlwdub 10:21:32 Working Session: Quality Metadata 10:26:35 topic: quality metadata 10:26:52 omstefanov has joined #mlwdub 10:27:36 for more info on this session, see http://www.w3.org/TR/2012/WD-its2req-20120524/#Quality and http://www.w3.org/TR/2012/WD-its2req-20120524/#Quality_Assurance_.28QA.29 10:28:07 Phil: Their interest is language quality and QA. Language review process can be very complex. 10:28:30 .. This is an opportunity to see new approaches and solutions. 10:29:18 .. Additional support of the audience will be very interesting for the implementations of these data categories. 10:29:49 Yalemisew has joined #mlwdub 10:30:16 Arle: The target audience for Quality Metadata are: LSP doing QA, Content Creators doind quality verification, 10:30:52 .. Authors marking errors and posteditors providing info on efficiency / MT quality. 10:31:46 .. Motivations: 85% of QA is spent on about 10% of content. 10:31:54 mlefranc has joined #mlwdub 10:32:01 .. so there are pontential cost savings. 10:32:42 .. to capture sistematically the problems you do not know where they come from. Provenace will help. 10:33:04 q+ domain / purpose values 10:33:08 q- 10:33:14 q+ to question domain / purpose values 10:33:29 q- domain 10:33:33 ... Some other data categories (purpose, domain...) can help to build business rules 10:33:34 q- / 10:33:38 q- purpose 10:33:41 q- values 10:33:49 q- 10:33:56 q+ to ask about domain / purpose values 10:34:06 Arle: Two complex Data Categories: errors and error profiles 10:34:10 dgroves has joined #mlwdub 10:34:42 .. the point is to have DC independently of the metric 10:34:45 gderiard has joined #mlwdub 10:34:51 Quality data categories in requirements doc at: http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Quality 10:35:44 .. the data model can be very simple, but some examples seem to need more complex attributes 10:36:13 q+ to ask about why qa-note 10:36:18 .. a simple sintax can refer to the basic error parameters. 10:37:12 Richard: too many attributes and info in that model 10:37:23 ARle: yes, this is one of the issues 10:37:58 .. an standard markup will be maybe a better solutions, but needs to be resolved. 10:38:31 dag: where would that be? In the content? 10:38:36 arle: in the localization content 10:38:46 dag: so in XLIFF? 10:39:21 Arle: Xliff can be an ideal scenario, but QA can be done in other enviroments, processes, etc. 10:39:42 phil: we were looking into using RDF and putting that into a triple store 10:40:03 .. we want to use RDF to do cross silo linking 10:40:06 Phil: Publishers need also to capture feedback of QA process. 10:40:20 http://www.w3.org/QA/2012/05/interview_ibm_on_a_linked_data.html 10:41:41 Felix: in localisation of software mechanisms of status os process are necessary 10:42:22 Dave: Need to carefully establish the scope 10:44:04 Felix: we need to have implementation for this markup 10:45:07 .. the botton line are the implementation of the Data Categories to see them in real apps. 10:46:32 quality error description with RDF using Provenance datamodel: [ a its:qualityErrorDescription ; its:qaType "..." ; ... ; prov:wasGeneratedBy [ a its:QAActivity ; ... ; prov:used [ a str:String ; str:anchorOf "verbs agrees"]]] 10:46:34 Dave: contradiction between the scope and the way of representing only one QA run 10:47:18 Richard: maybe better not to use span 10:47:34 micha has joined #mlwdub 10:47:42 ARle: a dedicated element can be used, but that gives other problems 10:48:34 Yves_ has joined #mlwDub 10:50:14 s/better not to use span/better to use a dedicated element rather than span, since it makes it easier to keep separate from the content/ 10:50:15 Tatiana: target users can be also MT training and development 10:51:37 Arle: additional specifier can be necessary in terms of recognition 10:51:49 .. more slides 10:51:58 q- 10:52:32 .. profile must be flexible and capable to be used in a global manner 10:53:32 (below action is unrelated to this discussion, just so that I don't forget) 10:53:33 action: felix to check whether we can use META-SHARE for identifying resources to be used in disambiguation 10:53:45 .. ex. qualityProfile 10:54:25 Dave: do you mean by "pass" the result of the QA? 10:55:09 ARle: Yes, but it is more intended to show what was done as QA 10:56:11 Phil: you can define errors with a high granularity, but also some scores for more important errors. 10:56:37 Arle: This is a very verbose markup 10:57:47 .. implementations need a big effort, so mechanism to know how is been done now are necessary 10:58:28 .. and it will affect the commitments and timeframe 10:59:13 .. it is out of scope to standardrise the different QA metrics 11:00:11 Yves: can you specify it making the distintion between consumer and provider 11:01:29 Felix and Phil: probably it is not necessary to separate consumer and producer 11:03:06 action: Dave to conclude quality discussion with Arle, including examples from existing implementation in CMS-LION - due to mid July 11:03:38 FePhil: there are some metrics that already people capture and use 11:04:03 Felix: there are many tools that can use this 11:04:29 End of session 11:04:53 Yves will join the discussion, providing input about what current tools do and how that relates to the current propsals 11:04:57 moran has joined #mlwdub 11:05:05 scribe: philr 11:05:38 TOPIC: Terminology metadata 11:06:06 Tadej Stajner presenting 11:06:25 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 11:06:44 Goal annotate fragments of text 11:07:09 Audiences, content authors, localizers, CMS, MT providers 11:08:39 Data categories, Term; Named entity; Disambiguation; Text analysis annotation 11:10:16 Annotations provide disambiguation through reference to semantic networks and aontologies 11:10:35 Challanges 11:11:04 s/Challanges/Challenges/ 11:11:48 Tadej: use ITS to support HTML5 11:12:20 ...aid term matching in TM and CAT tools 11:13:20 Tadej: are there more challenges? 11:13:53 nico has joined #mlwDub 11:14:03 felix: will TA approach lead to real-time tagging? 11:14:49 tadej: there is a potential for semi-automatic tagging 11:15:41 horst_kraemer has joined #mlwdub 11:15:42 What's being tagged are candidates, not terms themselves until they are human validated 11:16:48 Ioannis: not realistic to have full automatic solution 11:16:59 ...need to find ways for semi-automatic 11:17:32 tadej: identification or construction? 11:17:52 ...construction hard problem 11:18:44 r12a: ITS 1.0 was concerned with term definition, is this different? 11:19:47 fsasaki: one difference is disambiguation 11:21:51 daveL: ITS 1.0 is a reference 11:22:48 tadej: what are dereferencing scenarios? 11:23:17 ...TBX/RDF 11:24:07 ...would like to see some kind of retreival mechanism 11:25:18 pedro: candidate proposal interesting 11:26:02 ...content authors will need tool assistance to mark terms 11:26:45 ...glossaries are very guarded by companies 11:27:08 ...need to link/map to proprietary glossaries 11:27:49 fsasaki: can this be used by MT providers? 11:28:19 MT providers have own methods to disambiguate 11:28:55 ...problem of extending lexicons online 11:29:27 ...client terminology supplied with translation task 11:30:16 dgroves: open question of how SMT use 11:30:29 i'm wondering whether term definition and cross-language term equivalence should be in the same data category 11:30:39 ...difficult on-the-fly consumption 11:31:34 tadej: does this information help? 11:32:02 dgroves: unanswered question 11:32:59 dF: on-the-fly more useful for rules engines 11:34:11 tattiana: promising initiative. 11:34:24 ...could be a foundation 11:35:39 tattiana: 30% increase in MT quality from terminology related work 11:36:33 Are proprietary glossaries not linked to public repositories? 11:37:53 Ioannis: term candidates go through a rigorous process of approval - can take months 11:38:03 ...in enterprises 11:38:36 To note: the use of terminology in MT research is generally considered a type of domain adaptation 11:39:07 ...customers need help with terminology 11:40:03 Term lifecycle phase attribute? 11:40:46 s/tattiana/tatiana/ 11:41:52 tatiana: important to distinguish between aquisition and recognition 11:44:07 tadej: annotationAgent special case of provenance 11:44:43 ...example markup being presented 11:45:01 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 11:45:50 Ioannis: we have linking lexicon links 11:46:47 Necessary to have more than one term bases; product specific; cascading 11:47:02 ...client specific 11:49:54 tadej: stand-off markup cleaner 11:50:27 ...in favour of inline by default but need portability 11:50:53 session closed 11:52:23 rrsagent, draft minutes 11:52:23 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html r12a 11:53:15 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 12:56:03 ChrisLyons has joined #mlwDub 12:58:21 Milan has joined #mlwdub 12:59:13 mhellwig has joined #mlwdub 13:00:08 rrsagent, where am I? 13:00:08 See http://www.w3.org/2012/06/12-mlwDub-irc#T13-00-08 13:01:36 leroy has joined #mlwDub 13:05:26 scribe: Milan 13:05:43 Topic: Updating ITS 1.0 13:06:42 dF has joined #mlwDub 13:07:28 Tadej: Managing a lifecycle of terms 13:09:45 .. Confidence of the annotation (named entity) 13:10:13 .. difficult for some approaches 13:12:43 mlefranc has joined #mlwdub 13:13:14 .. Disambiguation for distinct words 13:17:08 dgroves has joined #mlwdub 13:17:23 action: Tadej to create a summary of implementation status of Terminology Metadata Generation 13:18:30 My notes from representation session are at http://lists.w3.org/Archives/Public/www-archive/2012Jun/att-0018/representaion.html 13:19:17 Felix: MLW-LT must support all ITS 1.0 and their functionality 13:20:30 Yves: How to distinguish 1.0 in 2.0? 13:20:48 Felix: We will have references (e.g. to Ruby) 13:21:49 Jirka: Prefer to keep Ruby in 2.0 13:24:17 nico has joined #mlwDub 13:25:36 Zakim has left #mlwDub 13:27:48 action: Jirka to summarize usage of Ruby in DocBook 13:28:48 rrsagent, draft minutes 13:28:48 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html Milan 13:30:57 fsasaki has joined #mlwdub 13:31:08 scribe: fsasaki 13:31:21 topic: presentation from alex lik 13:32:14 alex: localized publications - instructions for use, release notes, systems messages in a GUI, ... 13:32:38 .. full localization of GUI 13:33:18 .. tagset for XML - textcontainer for software is XML. If the ITS tagset works for any XML, software localization can benefit from that too 13:33:35 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 13:34:15 alex: medical device manufactures have a highly regulated environment 13:34:43 .. local regulations, CFR (QSR), ISO 13485, IEC 60601, directive 93/42/EEC 13:35:08 alex: challenges: 13:35:20 .. variety of authoring platforms, even within one company 13:35:33 .. end user materials is in DITA XML 13:35:56 .. having a tagset for "any XML" is a good condition 13:36:03 .. but we want to go further to have real single sourcing 13:36:24 .. modification of underlying format (and in the tagset) leads to changes in the fragmentation 13:36:40 .. that leads to changes when we analyse the material for localization price quotes 13:37:02 .. there is material that has been translated in XML; when you send it to HTML the price quote is comparable to the original one 13:37:34 .. there are materials that are in word documents that are not legal templates 13:38:27 .. there are many companies that have mandatory end user material 13:38:31 .. and other types of material 13:38:43 .. on one hand the separation is logical, but it can also impose problems 13:38:56 .. the localization costs can growth tremendously 13:39:38 .. having all content in the same repository & container will be helpful 13:40:00 .. question of information architecture - do we need to train developers to work with ITS? 13:40:27 moritz: depends a bit - one issue we are seeing. 13:40:36 .. the end user might not have a technical background 13:40:58 moritz: we are looking into finding ways for having interfaces for users 13:41:18 alex: I am talking about content managers & information architects 13:42:41 felix: these people need to understand XPath at least to be able to write some useful rules 13:43:15 olaf: documentum would produce a lot of pages, but multilingualism is not taken into account 13:43:34 .. there is a huge re-training of the information architecture people necessary to understand internationalization 13:43:56 alex: thanks a lot for that comment 13:44:20 .. back to the challenges - terminology mgmt was mentioned before 13:45:53 .. we can have our content in DITA, output in HTML 13:46:42 .. re-publishing is important. 13:47:02 pedro: some implementations we are doing give you some background: 13:47:16 .. you cannot put in the CMS all complexity of the localization progress 13:47:23 .. you have to connect through e.g. a gateway 13:47:32 .. to a platform that can do your requirements 13:47:46 Marion_Shaw has joined #mlwDub 13:47:58 expectations: compatibility with DITA XML 13:48:04 .. integrating in SW resource files 13:48:08 .. interop with XLIF 13:48:12 .. terminology mgmt 13:48:22 .. removal and re-integration of ITS markup 13:48:31 .. ease of implementatio for tool vendors 13:48:35 .. visible ease on the bill 13:48:52 alex: my main point is about the XML deep theb 13:49:42 .. not so much about CMS 13:51:04 .. take-away for me: educational aspects are missing for authors and others 13:51:12 topic: presentation from des oates 13:51:17 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 13:51:48 des: focusing on the processes that we adopt within Adobe 13:53:16 .. I am interested in this group because it covers three domains that are interesting for me: 13:53:24 .. content creation, publication, localization 13:53:59 .. publishing happens in different ways: raw HTML using CQ5 CMS 13:54:02 .. we publish software 13:54:22 .. and also documentation 13:54:33 omstefanov has joined #mlwdub 13:55:09 .. localizable content in source control systems, 13:55:28 .. a lot of content in multiple repository formats 13:55:40 .. framemaker, CQ5 again, web CMS 13:56:19 .. translation services used internally: TMS, another TMS, ... 13:56:41 .. adobe translator https://community.translate.adobe.com 13:57:21 .. many inputs, many outputs 13:57:24 .. a lot of complexity! 13:58:18 .. 18 months ago we created an internal mediation layer, connecting authoring / publication / translation together 13:58:25 .. you still have the three domains 13:58:44 .. they are connected to mediation layer 13:59:06 .. that supplies filtering / normalization, leverage, terminology / QA check, MT service 13:59:34 Jirka has joined #mlwDub 13:59:40 .. MT is an abstraction layer that allows to plug in various components: 13:59:46 .. moses, external MT providers, ... 13:59:54 .. we access them all through a set of APIs 14:00:34 .. each of the services is a potential consumer or provider of the metadata that we are discussing 14:00:41 .. they are decoupled, but they work together in workflows 14:01:00 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 14:01:09 des: we have many translation processes 14:01:22 .. we have to match different business requirements 14:01:40 .. we don't want to create customizations all the time, customizing connectors etc. 14:02:08 .. what is the purpose of the mediation 14:02:38 .. example: MT workflow: from CMS > XML, normalization process, XLIFF transformation, leverage of XLIFF 14:02:46 .. (check if everything is re-usable) 14:03:03 .. if it is not re-usable it will not be propagated through the workflow 14:03:38 .. after machine translation, content goes to post editors 14:03:52 .. after that content goes to XML and CMS (HTML) 14:04:04 .. that's a typical workflow that we deploy with our platform 14:04:11 .. where would the metadata be important? 14:04:31 .. in MT service: translate, in TMS: loc note and disambiguation 14:04:39 .. above are just examples 14:05:00 .. another example workflow: user generated content, also with XLIFF and MT 14:05:05 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 14:05:35 .. above is a real time process: user says "I want to have page in a different language", clicks, and gets the content 14:05:54 .. important here: translate, disambiguation, provenance metadata 14:06:15 .. we need the metadata for our SOA based localization 14:06:34 .. without a standard form of metadata we will loose data 14:06:44 .. provenance is important 14:06:50 .. ITS 2.0 should solve parts of these problems 14:07:11 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 14:07:17 des: beyond the data categories 14:07:52 .. I have an additional set of requirements, in addition to markup / attributes etc. 14:08:18 .. it should be straightforward to establish which subset of ITS 2.0 an implementation supports 14:08:39 .. in SOA, we need to know what metadata a system supports 14:09:03 .. that's orthogonal to the data modeling confirming 14:09:36 .. unknown domains and organizations private use of data categories should be considered 14:10:08 des: beyond / in concert with ITS2.0 14:10:26 .. ITS 2.0 will solve parts of the problem, other components will need to be addressed too 14:10:39 .. standardization of content packaging, see e.g. linport project 14:10:50 gderiard has joined #mlwDub 14:11:12 des: standardization of service boundaries 14:11:33 Here service boundaries = standard APIs for the various services. 14:12:13 des: clear opportunity of standardization of APIs 14:12:28 .. could help integrating terminologoy systems with workflows 14:13:42 dave: really enjoyed your diagrams, want to re-use them for our use-case document 14:13:48 .. also agree about service boundaries 14:14:01 .. not sure if there is a deliverable of the working group 14:14:33 .. but having examples / slides like that to communicate the problem is very helpful 14:15:09 des: yes, understand that at the moment it is out of scope, but we need to assure that it is taken up at some point 14:15:31 dave: I also agree that for a service you need a clean way to state what data categories you support 14:16:19 des: it is part of the business agreement, needs to be very clear 14:17:23 Felix: The conformance statements we have for 1.0, for 2.0 we need to make it clearer that someone implementing must have these statements. We should be clearer about what must be provided: e.g., machine-readable, human readable, various implementations 14:18:16 dave: some things are important for the in-formative part of the spec 14:18:31 .. e.g. in very clear best practices documents, see richard's example 14:19:23 xyz: in the "example workflow", is there a way to transform various formats into HTML? 14:19:47 des: xliff is the interchange format that we use across the platforms 14:20:10 .. normally we would use XLIFF to translate data from one service to another service 14:20:26 r12a has left #mlwdub 14:20:30 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 14:20:50 r12a has joined #mlwdub 14:20:50 s/xyz/Kerstin/ 14:20:59 davidF: XML is a transition format 14:21:14 .. if you start with HTML you will have HTML, same for other formats 14:21:27 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 14:21:39 daveF: XSLT is the transformation language 14:21:52 kerstin: would it make sense to convert the lexicons into HTML already? 14:22:06 des: don't see the use case for HTML, since that's a publication format 14:23:15 .. if you have word or pagemaker etc., you just need a filter to convert things to XLIFF or another interchange format 14:23:20 .. that's the rationale for the conversion 14:24:19 moritz: one issue - what is if we get metadata that we don't support? 14:24:46 des: it is important to know what you expect 14:25:27 This is an interesting issue: when can you strip metadata? What happens when you get back metadata invalid for your domain> 14:25:29 moritz: should metadata be stripped out? 14:25:37 s/domain>/domain?/ 14:25:38 davidF: it is important to have defaults 14:25:56 .. there are ways around it even if you don't support everything 14:26:31 Felix: One last point. I will take an action point to come up with examples of implementations and what they can do. See if they fulfill your requirements. 14:27:06 action: felix to come up with example of SOA related presentation of metadata capabilities for des' requirement 14:27:50 moritz: an ontology of process states - can we agree on that? 14:27:58 dave: we cannot standardize the process 14:28:09 .. we can just try to "normalize the language" 14:28:22 .. comes down to people like des and dag who have a whole view on the process 14:28:45 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 14:29:08 Agreed with Dave. The moment you claim to standardize the process, it creates problems. What you can do, however, is standardize the boundaries (per Des). Treat the process as a black box in your definitions, but one with well defined inputs and outputs. E.g. don't tell localization HOW to do things, but you can saw what it must make at the end of the day. 14:29:56 action: moritz, dave and others to look into process areas 14:30:14 davidF: interest is cooperation 14:31:47 felix: need to make sure that we resolve this is an timely manner 14:32:03 dave: this is also a part of our public relations work, not so much the data categories 14:32:12 .. people in the working group are familiar with the terminology 14:34:08 Felix: We need to have things resolved ASAP, but July. 14:34:28 davidF: will work on that 15:01:25 moran has joined #mlwdub 15:02:59 dF has joined #mlwDub 15:03:19 dF_ has joined #mlwdub 15:03:31 gderiard has joined #mlwDub 15:08:23 scribe: tadej 15:08:27 topic 15:08:45 topic: Bryan Schnabel: XLIFF Extensibility and Metadata 15:08:55 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 15:10:53 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 15:12:07 Bryan: There are three main ways to extend XLIFF1.2: elements, attributes, attribute values 15:12:14 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 15:13:06 ... with elements, you can use the usual XML namespace declaration mechanism 15:13:45 ... similarly with attributes, where it is allowed. 15:14:25 mhellwig has joined #mlwdub 15:14:26 ... with attribute values, you can prepend x- to your value where none of the existing options work for your use case. 15:16:09 ... There is an DITA OpenToolit XLIFF/DITA roundtripping tool you can use 15:17:30 micha has joined #mlwdub 15:18:30 ... It's implemented as an Ant tool, and the tools also keep the original PDF to preserve context of the content 15:19:19 ... The roundtripping XSLT required two distinct operational modes: the skeleton mode and the body mode. 15:19:53 ... A custom namespace preserves the formatting information, so it is fully reconstructible. 15:21:13 ... Here, I'm demonstrating a Drupal module for this. 15:22:32 ... All Drupal information gets stored in a custom namespace. Some of the metadata also shows up in the XLIFF namespace. After the XLIFF file is translated, the plugin can import it and everything works. 15:23:37 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 15:24:08 ... In XLIFF 2.0, the extensibility was more restricted. Previously, people re-implemented XLIFF functionality in their extension, which we do not want. 15:24:59 ... We then allowed custom namespaces for elements that are not already in XLIFF 2.0. 15:26:25 ... After discussion, the WG vote tied on allowing elements-only and elements-and-custom-namespaces. 15:27:16 Just as a reminder to folks, I am posting all slides here: http://www.w3.org/International/multilingualweb/lt/wiki/WS5_Preliminary_Presentations. 15:27:42 Bryan: The tie still remains, the TC will decide shortly - you all can also get involved and influence the discussion. 15:30:28 mlefranc has joined #mlwdub 15:31:04 r12a: What about background compatibility? Will XLIFF1.2 users be stranded? 15:31:26 dF: XLIFF is a perishable transport format, it is usually not persisted. 15:31:58 fsasaki: If during the generation of the metadata some extensibility is used in a particular use case, the whole extensibility layer needs to be rewritten for 2.0 15:32:39 Yves_: For more complex metadata, we want to structure elements in the document and carry that information forward. 15:33:03 DagS: How do you map ITS to XLIFF? i.e. how to map a tag to XLIFF?: 15:33:47 Yves_: For , there is a specific element that is compatible. As long as we don't do complex terminology stuff, it's fine. If we do complex things with other data categories, we can use namespaces to extend. 15:34:43 Yves_: In XLIFF1.2, you could use ITS as a namespace extension. 15:35:50 daveL: The extensibility seems to have implications on validation, making it more complex. For instance, implements should be able to say which parts should be supported in their implementations. 15:36:21 ... Do you have a criteria on what kind of extension are "acceptable" to you? 15:36:54 Yves_: Even internally in XLIFF, many components are modular and live in module namespaces which extend the core. 15:37:28 Yves_: That strategy could be good for supporting ITS tags within XLIFF. 15:38:39 Bryan: If the current extensiblity strategy doesn't work for you, you can raise your voice. 15:39:09 DagS: If it will go in this direction of restricting extensiblity, that is worrisome for the ITS ecosystem. 15:40:52 Des: We spent a long time shaping the requirements in the content domain, and we need to spend some time with supporting ITS in the localization domain. XLIFF is the workhorse there, and ITS needs to be integrated with XLIFF 2.0 15:41:30 DagS: Is restricting extensiblitly really going to help with the new implementations? 15:42:08 Yves_: One unsolved issue in 1.2 was segmentation representation. However, there is a mapping from 1.2 to 2.0. 15:42:28 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 15:43:47 Bryan: The people against extensiblity are just listening to the community, which voiced legitimate concerns about extensiblity. The opposition's opinion is that the problem was not extensiblity, but misuse of it. We can solve this by enforcing conformance clauses and setting expectations. 15:44:00 DagS: Can there be a special case for ITS? 15:44:20 Bryan: Possibly - see mailing list for XLIFF. 15:45:19 r12a: They should be closely tied. You don't need to use its: for using our data categories, but you can also integrate the data categories into the XLIFF markup and provide conformance support. 15:45:34 Bryan: That's not being planned, but people would listen to these suggestions. 15:46:42 dF: There will be custom namespaces. Even if XLIFF by default won't have the extensiblity, ITS can still be suggested as an OASIS namespace and module. 15:46:58 ... There is a proposed feature in XLIFF2.0 for ITS support. 15:48:12 r12a: I would really like to move this even closer: I realize it's still an optional extensional. Ideally, support would be integrated in the core. This doesn't affect inline markup? 15:49:25 fsasaki: There are other communities that are affected by this policy. 15:49:31 Yves_: Yes, like TBX. 15:50:15 daveL: Do we have any consensus here? How many people would like to say "Please keep namespaces in 2.0"? 15:51:02 DagS: We want to have XLIFF support ITS. The exact implementation is not that important (namespaces, modules, etc.) 15:51:41 Bryan: If you feel passionate about that, comment yourself. 15:51:44 action: felix to write a mail to XLIFF tc, check that on Wednesday morning again 15:52:43 topic: Localization Requirements 15:53:14 Yves_: We introduce the data category idValue as a selector of content. 15:53:49 ... Some discussion was about what do we need to identify? Segments? How do we select the ids? 15:54:57 mlefranc: This could be relevant to the provenance discussion tomorrow. 15:55:21 daveL: We need to be able to point to a portion of text when we talk about provenance. 15:56:06 ... There are several ways: introducing new tags, re-using existing markup, or doing completely stand-off annotation. 15:56:32 Yves_: I am referring to existing IDs in the document and should be persistent throught the process. 15:56:51 ... Data categories are discrete, the more orthogonal the better. 15:57:26 daveL: How important is this requirement for mapping translation units? 15:57:47 Yves_: It's important for localization, but XLIFF has its own id space. 15:58:40 kurzum has joined #mlwdub 15:59:43 fsasaki: Let's continue this discussion tomorrow, there's also the targetPointer debate. 16:00:18 scribe: daveL 16:00:57 TOPIC: BCP 47 Developments 16:01:22 Mark Davis presents remotely 16:02:41 ...introduces unicode locale/lang ID which is based on BCP47 16:03:15 ...there are extensions to these code 16:04:20 ...extension U relate to locales with various calendars, phone number formats, digit sets etc 16:04:47 ... e.g. arabic with arabic numbers or western numbers 16:05:33 ... t extensions, indicate transform of content, see rfc6497 16:05:55 ... e.g. transliterattion, translation, transscription 16:06:52 ... intended for interchange circumstances where there is no structured way of expressing the transform 16:08:24 ... options can indicate transform mechanism, input method, ketboard method and a specific on for machine translation - plus ones for private use 16:09:28 ... resources avalable for choosing language tags and extension fields and sub/fields 16:11:53 Yves asks confirmation that ID is both an inication to future action and can be used to report past actions 16:12:17 Mark suggests using differing tag codes for these different uses 16:13:12 ... but confirms that the coding scheme could be used for either 16:14:01 Dag asks confirmation that this is not intended to replace locale tag 16:15:53 Mark states this could be used to tag content as being in one language but additionally that it was machine translated from english 16:16:45 Dag concerns that if this replaced locale tag could confuse implementer, overriding exisitng functions that understand straigh BCP language tags 16:18:14 Mark responds that the tag could be ignored if the application is not understood 16:19:06 David Filip reiterated concern that overloading the tag for both instructions and reporting would be dangerous 16:20:06 Mark responds that its up to the author of the content to decide how to use the tag 16:20:43 David expanded to ask if the usage should be context driven 16:22:09 Mark responds that it can be used for request and response, and response doesn't have to conform to request 16:22:27 ... but does expect to use tthis in a richer environment, e.g. and XLIFF document 16:23:29 r12a state that the attribute should dictate whether this is infooration or an instruction 16:24:14 Jirka asks if there had been consideration of using such tags in http headers 16:24:51 Mark states that the intention is this should be used in a very wide range of circumstances and languages 16:25:27 ... purpose for putting it in BCP47 is to support this wide variety of contexts 16:27:02 Felix asks if process tracing data would use this tag, or would it be in more structured markup 16:28:26 Mark responds that this could be used in other standards, and perhps the constituent codes could be used 16:29:09 Felix asks for some guideance on more usage on this extension, he will follow up. 16:29:35 rrsagent, generate minutes 16:29:35 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html daveL 16:29:52 TOPIC: wrap-up for today 16:29:54 action: felix to follow up on bcp 47 "t" guidance in i18n core working group 16:30:28 Arle encourages peopple not already in the group to get more involved in WG 16:30:59 i also wanted to make the point that lang= or xml:lang= in a web page ONLY means that the content within the associated element is in a particular language - if you want to indicate some other thing, such as the language of an external resource or a request for information in a particular language (see http://www.w3.org/International/questions/qa-when-xmllang) 16:31:11 Arle recognises sponsorship of CNGL in this event 16:31:17 this is something that often trips up working groups 16:31:31 ... and highlights next MLW workshop in March 2013 in Rome 16:31:39 link to locworld event at http://www.localizationworld.com/lwseattle2012/feisgiltt/ 16:31:48 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 16:32:02 dF has joined #mlwdub 16:32:48 ... and there is a workshop planned on cross institute interoperability called FEISGILT colocated with LocWorld in Seattle in oct 2012 16:33:18 FEISGILTT 2012 call for papers: http://www.localizationworld.com/lwseattle2012/feisgiltt/ 16:33:32 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html fsasaki 16:33:35 ...thanks you and see you tonight in turks head and tomorrow on second day 16:33:43 rrsagent, generate minutes 16:33:43 I have made the request to generate http://www.w3.org/2012/06/12-mlwDub-minutes.html daveL