08:06:24 RRSAgent has joined #mlw 08:06:24 logging to http://www.w3.org/2012/03/15-mlw-irc 08:06:30 meeting: MLW-Workshop 08:06:33 chair: Richard 08:06:36 scribe: various 08:06:54 agenda: http://www.multilingualweb.eu/documents/luxembourg-workshop/luxembourg-program 08:07:00 topic: introduction 08:07:11 Richard introduces the project and the workshop 08:07:22 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 08:08:35 present: Richard, mlwPeople, workshopAttendees 08:08:36 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 08:08:59 tadej has joined #mlw 08:11:23 Nicoletta has joined #mlw 08:12:03 aaronb has joined #mlw 08:12:29 Dominic-Jones has joined #mlw 08:12:32 monica has joined #mlw 08:13:25 lbellido has joined #mlw 08:13:33 brian-joomla has joined #mlw 08:14:54 AndroUser has joined #mlw 08:15:27 topic: welcome by Piet Verleysen 08:16:04 Piet: europe has 24 official languages 08:16:16 .. we still have difficulties still today 08:16:28 .. it is not surprising that people have problems to understand each other 08:16:49 .. the web helped a lot in moving communication between people forward 08:17:11 .. in the mlw project it is fundamental to bring the experts of the world together to make things easier 08:17:21 .. all websites should be multilingual 08:17:37 .. it is still difficult to combine 100-200 languages in a website 08:17:47 .. it should be easy to have access in any language for users 08:18:07 .. it should be more easy for the linguists to transfer information between languages 08:18:17 .. using the linguists's advanced tools 08:18:34 .. important problem, you should find the right (technological) solutions to make life easier 08:18:52 .. after 60 years of computer development I'm surprised how technology makes life complicated 08:19:08 .. integration and easy of use of tools is still far away 08:19:41 .. this is today - multilingualism on the web and in our life is a complex problem 08:20:08 .. if you want to share enthusiasim and passion, it is important to have identification 08:20:37 .. I saw a logo of a project - be aware of the importance of that, so that people identify you 08:20:50 .. please use the right symbol in your work 08:21:00 .. this is the 4th workshop, a milestone in the project 08:21:14 .. please come up with some conclusions to make our life easier 08:21:26 .. to achieve that you need standards and interoperabilty 08:21:46 .. there is no policy session here - policy is important, without policy support nothing will happen 08:21:56 .. your "best solution" will not be used in the world 08:22:19 .. I'm sure our colleagues from dginfso are aware of this and Kimmo Rossi will work hard on this 08:22:53 .. I this you a good work and looking forward to see your results that make life of users, linguists, technicians easier 08:22:59 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 08:23:12 topic: Presentation from Kimmo Rossi 08:25:19 chaals has joined #mlw 08:25:43 Kimmo: 4000 people working in Luxembourg in translation 08:26:23 .. glad to be here - most of this is close to this building (Jean Monnet building) 08:27:12 .. dg infos projects behind this event: MultilingualWeb and MultilingualWebLT http://www.w3.org/International/multilingualweb/lt/ 08:27:59 .. we are in a re-organization process, name of departments might change 08:28:17 .. Richard mentioned that we are funding two projects behind this event 08:28:56 .. new project MultilingualWeb-LT - this the wrap up of "heritage project", and a start of MultilingualWeb-Lt 08:29:19 .. the follow up project will take the message about the gaps and the challenges to build practical reference implementations that mean something to the industryy 08:29:22 s/yy// 08:29:39 kimmo: it is very focused on machine translation, content management and localization 08:29:53 .. "LT" of course stands for "language technologies" 08:30:32 .. we had planned to combine this workshop with a showcase of European projects, but that will be separate 08:30:46 .. first at LREC in Istanbul in May with an exhibition of European projects 08:31:14 .. and META-FORUM event organized by META 08:31:36 .. join the alliance of META to demonstrate a push for language technology 08:31:48 .. it gives you visibility and new businesses 08:32:10 .. the META-FORUM event Brussel 20-21 will feature an exhibition of LT projects in our portfolio 08:32:48 .. about future opportunities: the "Connecting Europe facility" (2014-20) 08:33:02 .. it is most concrete opportunity to demonstrate what LT can deliver 08:33:24 .. CEF consists of several parts (roads, energy grids) and digital service infrastructures 08:33:40 .. these infrastructure contains "multilingual access to online services" 08:33:46 .. that is our part in CEF 08:34:05 .. I will suggest a breakout session tomorrow about what that part of CEF should contain 08:34:23 .. idea is to have "language services available everywhere" 08:34:46 .. idea is not to take things away from the industry, but provide a platform to share and trade, for industry, public sector and citizens 08:34:59 .. and aim is to make the web truly multilingual 08:35:15 .. if you have further questions I can tell you more in the breaks or during the breakout session 08:35:23 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 08:37:24 topic: Presentation by Ivan Herman 08:37:27 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 08:39:02 Ivan: what is semantic web about? 08:39:32 .. for some people semantic web is a "knowledge management system", with big ontologies 08:39:48 .. other people don't care about ontologies, they think about large amounts of data 08:39:56 .. others think about enhancing search 08:40:01 .. others about integrating data 08:40:16 .. so people do this and that. Example from a chinese university 08:40:41 .. incredible wooden structure, beautiful but complicated 08:40:52 .. people described the knowledge how the structure was put together 08:41:13 .. and they created beautiful videos showing that - this is knowledge management at its best 08:41:20 .. next example: medical application from the US 08:41:41 .. takes a lot of data. Aim is to personalize the data, combine the data, extract knowledge etc. 08:42:05 .. BBC has pages on music and musicians, example of "Eric Clapton" page 08:42:48 .. BBC does not act facts themselves - they have a system that aggregates the data from other providers, so again a very different application 08:43:01 .. another eample: IMBD - gives reviews on movies 08:43:15 .. in the source they have added additional structured data (microdata) 08:44:05 .. that will be used by google in search. The google search result shows a 4 stars assessment of the movie, taking from the site during crawling 08:44:19 .. this is the current state of semantic web: we have many application areas, see above 08:44:37 .. the general idea behind all this is: there is a lot of data on the web 08:44:48 .. more and more applications rely on the existence of the data 08:44:57 .. we do not want data silos 08:45:08 .. imagine a web that had documents but without links between them 08:45:21 .. real value of the web is not pages on the web, but links between pages 08:45:39 .. example of three different interfaces related to neuro biological issues 08:46:16 .. they have three different interfaces, and the data bases that need to be combined hard wired 08:46:38 .. via the web, we can achieve linkage between such data silos, so that the data is a kind of unity 08:46:54 .. semantic web is a set of technologies with the real goal to build a web of data 08:47:02 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 08:47:45 Ivan: on a longer term, we want to see the whole web as a huge global data base 08:48:06 .. as a long term goal 08:48:18 .. what is happening at w3c today? 08:48:21 .. in that area 08:50:14 .. RDF is the data format for semantic web 08:50:36 .. SPARQL is used to query RDF data, like SQL for relational data bases 08:51:17 .. SPARQL is about graph patterns in the semantic web "graph" 08:51:37 .. SPARQL has been a standard for some time. Now we are working on new features 08:51:42 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 08:52:40 Ivan describing new features in SPARQL 08:54:08 Ivan: sparql has been already a unifying point between semantic web applications 08:54:41 .. with sparql 1.1. this becomes more complex, but also more powerful 08:57:23 .. exporting of RDF is another topic - one approach is "direct mapping" 08:59:53 .. is good for a general conversion 09:00:16 .. but we need another step to have the graph that we really want 09:00:38 . it is a layer on top of the direct mapping, to give additional rules for creating the RDF graph 09:01:11 .. to create what your application needs. The additional step is expressed by R2RML 09:01:54 .. both the direct mapping and the R2RML approach are currently being implemented ("candidate recommendation" phase) 09:02:15 .. should be finalized (a w3c "recommendation") by this summer 09:02:33 .. next topic: adding data to HTML pages 09:02:53 .. data per page is not much compared to data bases, but still there is a lot of data 09:03:02 .. that is very valuable for search engines or other applications 09:03:56 .. two approaches: microdata and RDFa 09:04:14 .. both very similar. RDF can be extracted by both 09:04:28 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 09:04:49 .. microdata has been optimized for "one vocabulary at a time", doesn't have data types 09:05:03 .. RDFa provides the full power of RDF, with the price of more complexitiy 09:05:20 .. RDFa light is on the same level of complexity of microdata 09:05:27 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 09:06:12 .. next topic: RDF working group 09:06:23 .. RDF itself is the basis of all semantic web technologies 09:06:58 .. it's like links from one page to the other. The only difference is that in RDF the links that have a name, and there is additional infrastructure to make use of that 09:07:13 .. RDF is being cleaned up in RDF 1.1, no big changes 09:07:21 .. the turtle serialization is being standardized 09:08:14 .. and other features are being added, only a few 09:08:19 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 09:08:30 ivan: work on RDF 1.1. has begun a year ago 09:08:44 .. last working group is called provenance 09:09:09 .. goal is to add metadata to data on the web like: how was the data created 09:09:17 .. revision structure, revision history 09:09:39 .. for this you want one vocabulary - that is the goal of the provenance group 09:10:00 .. there needs to be a balance between something simple and useable, and something more complete 09:10:12 .. that is the balance that the group is working on 09:12:13 .. now coming to linked open data cloud 09:12:24 .. there are a lot of data sets out there 09:12:38 .. LOD diagram is nice but a bit misleading 09:13:03 .. there is an additional diagram showing interlinkage more clearly - there are still many links missing 09:13:22 .. major challenges of Semantic web are: scale of the data 09:13:28 .. interlinkage 09:13:37 .. ability to read and write data ("SPARL update") 09:13:49 .. currently discussing "linked data platform WG" 09:14:03 .. to work on HTTP infrastructure to modify linked open data 09:14:31 .. other challenges: data quality, ... 09:15:05 .. other challenges: role of reasoning with the amount of data 09:15:12 .. highly distributed data 09:15:23 .. huge amount of data in a few vocabularies 09:15:37 .. how to do inferencing in this kind of setup not easy 09:16:11 .. major challenge is really interlinked data on the web 09:16:21 .. semantic web is trying to help 09:16:34 .. about multilingual web 09:16:35 Jirka has joined #mlw 09:16:45 .. what can be the relationship between multilingual web and semantic web 09:17:06 .. I have the impression that semantic web can give powerful technologies to categorize knowledge 09:17:12 .. that can be created in different languages 09:17:36 .. linked data also gives a source of information that you can use 09:17:58 .. e.g. analyze a blog, fetch semantic web data to use for that analyzes 09:18:11 .. not always for translation, but also for language specific technologies 09:18:29 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 09:19:10 ivan: semantic web has a very simple way to represent languages 09:19:18 .. we need more complex ways 09:19:26 .. english is used for all vocabularies 09:19:48 .. with the current infrastructure it is hard to reason across languages 09:20:12 .. we also have a cultural issue - we find vocabularies that are badly designed in terms of localization 09:20:27 .. a need for improvement 09:20:45 .. looking forward for the discussion to learn what you see 09:21:30 chaals: how does semantic web relate to private data 09:21:47 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 09:21:59 ivan: big question mark - how to combine access control with semantic web 09:22:19 .. currently you have sometimes semantic web applications behind firewalls 09:22:26 .. but that's no solution 09:22:41 chaals: what do we know about how people use semantic web? 09:23:05 ivan: we know a little bit more. We had a workshop last december about how linked open data was used in the enterprise 09:23:17 .. one message was: there is lot's of data here 09:23:35 .. but there is the need of a low level APIs about access 09:23:45 .. that is wanted e.g. by large companies 09:23:53 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 09:41:49 Jirka has joined #mlw 09:50:24 aaronb has joined #mlw 09:50:52 topic: presentation by Jan Nelson 09:52:24 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 09:53:08 jan: we put a lot of efforts into languages in windows 8 09:53:22 .. 109 languages in total 09:53:46 .. example: 35 mill. customers in US that speak spanish at home 09:54:21 .. so language have a huge opportunity 09:54:35 .. windows store: helps to deliver apps in more than 200 markets 09:54:46 .. with developer support for localization 09:54:58 .. metro style apps technology stop 09:55:08 .. lot's of programming languages supported 09:55:16 .. c++, html5, etc. 09:55:40 .. multilingual app toolkit 09:55:50 .. purpose to help to manage translation 09:56:01 .. has a pseudo language engine for localization testing 09:56:44 .. now demo of the toolkit 09:57:45 .. showing a weather app 09:57:58 .. in the app preference language is now set to German 09:59:13 .. rebuilding the app, it shows up in various languages, with pseudo translation including bing machine translation services 09:59:32 .. XLIFF files are being created on the fly to support translation 10:00:59 .. in a separate editor translations are handled, including marking non-translatable text 10:04:05 Dominic-Jones has joined #mlw 10:04:35 topic: presentation by Tony Graham 10:04:40 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 10:07:39 tony describing the creation of layout via XSL like the creation of the world - this is just great - sorry, I can't scribe this 10:08:22 Zakim has left #mlw 10:08:34 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 10:08:55 tony: xsl 1.1. has a large section on internationalization 10:09:08 .. xsl has always been good on i18n: writing modes for multiple scripts 10:09:26 .. properties are defined in terms of "start and end", not "left, right, ..." 10:09:57 AndroUser has joined #mlw 10:10:09 .. xsl-fo has the concept of different baselines of text 10:10:33 .. in XSL 2.0 we want to do a lot more of internationalization 10:10:44 .. in 2008 w3c had a japanese layout taskforce 10:10:51 .. experts working to define Japanese layout 10:11:04 .. taking a japanese standard as a basis 10:11:12 .. and the expertise of Japanese layout experts 10:11:24 ivan has joined #mlw 10:11:36 .. the "japanese layout document" is useful for implementing these features 10:11:56 .. ruby applies to Japanese and Chinese (bopomofo etc.) 10:12:28 .. there is a lot of information about Japanese thanks to the layout taskforce 10:13:31 .. most translated document: "Universal declaration of human rights" 10:13:45 .. often used to compare quality of layout in various languages 10:14:09 .. these days we can cover a lot of languages just with web browsers 10:14:44 .. UDHR is also avail. in Unicode, see http://unicode.org/udhr/index.html 10:14:50 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 10:15:03 tony: last year I worked on formatting Khmer 10:15:22 .. I used UDHR as an example, there were many issues in the Khmer layout 10:15:51 .. so there is a need to learn more about local needs related to layout 10:16:02 .. the japanese layout taskforce is very useful 10:16:12 .. the requirements document is used by XSL, CSS, other groups 10:16:26 .. should w3c make more taskforces? Requires more funding, efforts 10:16:45 ... easier with the w3c badge, easier to justify 10:17:09 .. or should there be a multilingual layout community group? 10:17:52 .. easy to set up, see http://www.w3.org/community/ 10:18:08 .. contributor agreement makes it easy to use the outcome 10:18:37 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 10:20:12 topic: presentation by Richard Ishida and Jirka Kosek 10:20:25 richard: key issues currently done in HTML5 10:21:25 s/key issues/presenting key issues related to multilingual topic/ 10:21:30 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 10:21:49 richard describing the i18n working groups in w3c: i18n core, MultilingualWeb-LT group 10:22:24 .. internationalization interest group, other mailing lists etc. 10:22:34 .. please participate and contribute, we need your support and input 10:24:12 richard: example of bidi in embedded text, visualization wrong because of missing directionality information 10:24:34 .. new "bdi" tag to create proper visualization 10:25:18 .. next topic: ruby 10:26:08 .. additional information e.g. about pronunciation of pictographic (Japanese) characters 10:26:26 .. Japanese layout document - currently producing a 2nd version of that 10:26:48 .. gives a lot of detail - would love to have this for Korean, Chinese, Arabic, Indic scripts 10:27:10 .. if you want to participate or know people who want to participate, please let us know 10:27:31 .. ruby in HTML5: there is no "rb" tag, you can put several annotations in ruby element 10:28:07 .. some problems, e.g. you want to highlight the ruby text itself: doesn't work because there is no tag to select 10:28:19 s/no tag/no specific/ 10:28:31 .. you can use a "span" element, but that has issues to 10:29:18 .. we are working on these questions currently, looking for advise 10:29:38 .. working also with implementors on moving this forward 10:30:45 .. now jirka about the "translate" flag 10:31:14 jirka: localization and translation has a lot of issues 10:31:34 .. "translate" flag helps with this. In many documents you have flags that should not be translated 10:32:02 .. if you use automated translation it would be helpful to have additional metadata that will help - it identifies parts not to be translated 10:33:14 .. also helpful for human translation and translation workflow in general 10:33:34 .. "translate" attribute proposal started a year ago at a multilingualweb workshop, but now it's added to HTML5 10:33:49 .. online machine translation services support this already, e.g. bing translate and google 10:34:09 .. it is also supported by content formats like DITA and DocBOOk 10:34:24 .. in the MLW-LT working group, we will work on better integration of this into HTML5 and other metadata 10:34:29 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 10:34:37 lbellido has joined #mlw 10:35:40 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 10:38:43 christian: has the "translate" flag been considered for inclusion in CSS? 10:38:49 jirka: don't think so 10:38:55 .. in CSS there is just plain strings 10:39:02 .. there is no markup to convey additional metadata 10:39:18 .. if you need to localize CSS, I propose a pre-processing step 10:39:30 richard: CSS is for presentation 10:39:40 .. it is not the content 10:39:49 .. for the bidi tags for example, you could do bidi 10:39:55 .. but we strongly recommend that you don't 10:40:03 .. because the bidi information is part of the document 10:40:31 .. so I would propose to see CSS just as the presentation layer 10:40:37 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 10:41:04 ivan: ruby, bidi and translation 10:41:14 .. these are features that non-XML formats also want to have 10:41:19 .. like JSON, RDF etc. 10:42:36 felix: MLW-LT group will work on brining some of the features into other formats, we should talk about how to add that into Semantic Web 10:43:00 jirka: for JSON you can have HTML inside it that contains the "translate" flag and other markup 10:43:38 .. I would hope people rather produce XML which makes it easier to make that kind of metadata 10:43:38 pedro has joined #mlw 10:44:50 xyz: question of "translate" flag 10:45:22 .. currently we create rules for several elements, not only one - what to do about this? 10:46:03 jirka: the ITS 1.0 specification has a mechanism to create such rules. It is not in HTML5, but you can combine both HTML5 "translate" 10:46:11 .. and use a processing chain with rules 10:46:27 richard: "translate" attribute is an interoperable solution 10:46:41 .. bing translate, google translate recognizes it 10:46:49 .. there are other solutions, but that are not standard 10:47:18 for more info about "translate", see http://rishida.net/blog/?p=831 10:47:31 abc: from university of karlsruhe 10:47:56 .. we had a scenario to annotate fine grained localization information 10:47:58 labra has joined #mlw 10:48:13 .. how do you deal with this? 10:48:22 ian: you saw examples for windows8 10:48:34 .. it is up to translators to deal with what they want to represent 10:48:49 .. we support them 10:49:07 richard: if you want to translate something like luxembourg french there is a way of labeling 10:49:24 cvaldes has joined #mlw 10:49:40 axel: we talked a lot about the translate flag in the past, happy that it was standardized 10:50:05 .. sometimes people are asking of specific translations, 10:50:26 .. have you asked about having more values for translate to specify that? 10:51:01 jirka: in full ITS there is support to specify things like that 10:51:36 .. as part of the MLW-LT project, we are planning to have a mechanism that supports RDFa, microdata or other mechanism to include that in HTML5 and other areas 10:52:01 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 10:54:09 For Axel - support for terms in ITS: http://www.w3.org/TR/its/#terminology 10:55:35 felix: call for feedback about features of MLW-LT, please give us your feedback and let's put implementations into the centre 10:55:41 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 10:56:24 scribe: Jirka 10:56:43 Jan Nelson is introducing speakers 10:57:10 topic: Building Multi-Lingual Web Sites with Joomla! the leading open source CMS 10:57:24 by Brian Teeman, JoomlalShack University 10:57:53 ... introducing joomla cms 10:58:15 ... community project, no company behind it 10:59:09 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 11:00:25 ... joomla supports 57 languages 11:01:15 ... joomla provides 3 options for translating websites 11:01:54 ... 1 - machine translation using widgets from Google, Microsoft 11:02:48 ... quality is not guaranteed, not indexed 11:03:26 ... 2 - parallel translation using plugins, everything has to be translated 11:03:38 ... translations are indexed 11:03:47 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 11:04:08 ... the question is whether we should just translate or provide local content 11:04:43 ... 3 - sites within site, translate content only when appropriate 11:06:46 ... the key in joomla is categorise, add and show 11:09:59 ... for each language different menu can be provided 11:10:58 topic: How standards (could) support a more efficient web localization process by making CMS - TMS integrations less complicated. 11:11:14 by Loïc Dufresne de Virel, Intel Corp. 11:11:34 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 11:12:48 Loïc: tensions between relying on standards and using new technologies 11:13:15 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 11:15:21 ... showing ugly XML 11:16:53 ... translation handled by two plans 11:17:15 ... plan A - more automated, developed in 9 months, 6x efficient then plan B 11:17:42 ... plan B - more manual process developed in 3 months 11:21:51 ... for interoperability, all processes has to be updated to support Unicode 11:22:47 ... maybe also to support XLIFF 11:23:19 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html Jirka 11:24:08 topic: Translation and localisation in 300+ languages ... with volunteers. The best practices. 11:24:20 by Gerard Meijssen, Wikimedia Foundation 11:24:50 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 11:25:19 ... goal of wikimedia is to allow all human to share same knowledge, thus localization and translation is very important 11:25:37 ... wikipedia now in 283, requests for 129 more 11:25:50 Tony_ has joined #mlw 11:28:58 ... problems with fonts for scripts 11:29:04 ... solved by using webfonts 11:29:31 ... there are no good free fonts for all scripts, wikimedia is supporting development of some fonts 11:29:46 ... missing input methods for some languages 11:31:18 ... using ISO-639-3, Unicode and CLDR 11:31:33 ... using TM and MT 11:31:45 ... all localizers and translators are volunteers 11:32:49 ... l10n is more expensive then development 11:33:40 ... support more languags then CLDR 11:33:49 .... 6000 languages are still not supported 11:35:53 ... languages not supported in CLDR are not supported in applications (text editors, browsers) 11:37:33 ... looking for a solution 11:38:14 topic: Q&A session 11:38:15 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 11:39:34 xyz: How do you support users who are looking for content which is there, but users don't know the language. 11:40:21 Gerard: currently only the current language is being searched 11:40:56 ... ongoing project for searching in several langauges at one time 11:41:38 Richard: Do you use BCP 47 or ISO-639-3 11:42:01 Gerard: there is no difference between language and locale sometimes 11:42:37 ... BCP 47 is used when ISO-639-3 is not sufficient 11:43:30 Tomas: MediaWiki, Joomla, ... the solve similar problems, but solutions are different, there is no standard. 11:43:49 Gerard: we want to use standards, please help us to improve CLDR 11:44:14 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 11:44:51 Tomas: we are lacking standard for multilingual websites 11:45:27 ... each system uses different approach for translating content 11:45:41 Jan: this is purpose of MLW and MLW-LT 11:47:26 Reinhard: How do you motivate localizers to work for free 11:49:10 Gerard: Tools are not prepared for some languages 11:50:17 Brian: joomla is completely community driven, people wanted to build web sites in their languges 11:50:35 ... we make it easy to supply translation for additional languages 11:51:32 Axel Hecht: in Mozilla each localization team has different motivations 11:51:44 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 11:52:39 xyz: are styles switched when translation is done on the flight 11:52:46 Brian: yes, in joomla 12:53:21 Arle has joined #mlw 12:59:18 Dominic-Jones has joined #mlw 13:03:20 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:04:14 monica has joined #mlw 13:04:23 Jirka has joined #mlw 13:04:25 topic: Spyridon Pilos on machine translation in the EC 13:04:39 spyridon: new mt service at EC 13:04:45 .. work started in october 20110 13:04:54 scribe: chaals 13:04:58 s/20110/2010/ 13:05:07 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:05:20 ... we already have system around open software for a lot of languages, being used since last july 13:05:45 ... I want to explain what we need in standards to make this work better 13:06:03 ... focus on openness and flexibility, and ensuring technological indepenence 13:06:14 ... (repeating what people have said, a bit) 13:06:28 [slide - service architecture] 13:07:13 ... We have users, and we want to connect data. We have orgniased the project in 3 action lines - the MT engines, working closely with the data part 13:07:15 aaronb has joined #mlw 13:07:28 ... Data part focused on preparation to improve output quality. 13:07:31 aaronb has left #mlw 13:07:47 r12a has joined #mlw 13:08:10 ... Our users are the Commission, and services funded by the Commission (eg ted - tender documentation) 13:08:42 ... For MT we started with Moses, because it is an EC-funded open source system, and started using it and collecting feedback. 13:09:13 ... We want to use more data, more MT technologies where Moses isn't the best so we want to be able to swap it out 13:09:21 ... handle post-editing, ... 13:09:27 ... My focus is the data. 13:09:55 Nicoletta has joined #mlw 13:09:58 [slide - Multilingual Web = Multilingual Content] 13:10:35 ... An author, different translators who each have their own working methods, a publisher. 13:11:18 ... A different publisher might not work in the same way, so the content needs to adapt. 13:12:02 ... Publisher needs to be prepared to receive the different languages 13:12:12 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:12:19 [slide - Language Applications] 13:12:29 ... We want to give data to the web, and get it from the web. 13:13:02 ... Getting data from one website is easy. But adding a second source meant having to rewrite the systems, and if a site changes there is more work to do. And so on for each website. 13:13:16 ... Where there is no standard to follow, this is normal. 13:13:33 [slide - Giving our data to the Web] 13:13:56 ... We want a system that takes data from databases, and makes it possible to automatially publish in multiple languages. 13:14:15 ... There should be continuity in what users get. 13:14:48 ... We have had to make our own approach, and then we need to stick to it. 13:14:53 [slide - Conclusion] 13:15:14 ... We need to be able to get Mulitlingual information from, and pubish it to the Web. 13:15:23 s/pubish/publish/ 13:15:45 ... Need to allow free flow of information between applications without losing a lot of time on adapting data. 13:16:01 ... We expect LTweb to show a feasible approach, and demonstrate the benefits of this. 13:16:19 s/LTweb/MLW-LT/ 13:16:21 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:16:53 ... We are trying to be active (echoing richard's "tell us what you need"). And we are ready to change. 13:17:25 ... We have our internal systems, which we are ready to abandon for a broader standard if there is one. 13:17:39 ... So we are major users prepared to test, and to actively contribute in development. 13:17:44 TGraham has joined #mlw 13:18:13 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:18:16 Topic: Matjaz Horvat - live licalisation 13:18:23 s/pubish/publish/ 13:18:24 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:19:09 s/live llcalisation/Mozilla, Live Website Localization/ 13:19:46 MH: The difference between Slovenia and Slovakia: there is love in sLOVEnia 13:20:30 ... Seeing my name written Matjŧ got me involved in localisation 13:20:31 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:20:53 s/licalisation/localization/ 13:20:54 I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:21:22 s/Matjaz/Matjaž/ 13:21:34 [slide - Exiting Approach] 13:22:00 labra has joined #mlw 13:22:04 MH: We localise a lot of stuff at Mozilla. Usually we extract text, give the strings to localisers, and then post them back to the Web 13:22:30 [shows a website in english, how you translate the string, and what it looks like afterward] 13:23:01 ... Problem. Localisers don't see the context which is a problem. And don't see the available space. 13:23:16 ... What can we do? 13:23:55 ... In HTML5 we have contenteditable, which makes it possible to just change text on a website - e.g. translating things you see. 13:24:22 http://pontoon-dev/mozillalabs.com -> a development project to work with this. 13:24:32 [live demo based on flaky versions of everything...] 13:25:02 ... add a line of code to a site, then in the pontoon side you can give the URL and start translating. 13:25:45 ... A UI at the bottom to manage the translation, and then you select some text, and edit it to change the language. 13:26:04 ... It's all cool. 13:26:07 ... Except... 13:26:25 ... How do you transalte metadata like or error messages in javascript? 13:26:47 <chaals> ... We have an advanced mode that shows all the strings you have. 13:28:16 <chaals> Topic: David Lewis - Metadata interoperability between CMS, and the Multilingual Web LT project. 13:29:07 <chaals> DL: ML-LT follows from the multilingual web project. Get involved... 13:29:17 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:29:22 <chaals> ... There is a new W3C working group 13:29:46 <cvalds> cvalds has joined #mlw 13:30:22 <chaals> [slide - MultilingualWeb-LT] 13:30:42 <chaals> ... How do we make it easier to integrate content going through translation? 13:31:10 <chaals> ... Already getting uptake from people beyond the project. 13:31:48 <chaals> ... Started with a lot of representation from localisation industry, we could do with more input from Content Management and Users ... 13:32:02 <chaals> s/from/from more/ 13:32:49 <chaals> ... Key to the process is not just specification, but actual implementation. 13:33:23 <chaals> [slide - Approach] 13:33:44 <chaals> ... Heritage owes a lot to the ITS specification 13:34:59 <chaals> ... it is nice that it is small, but we could add some more useful information using this. 13:36:16 <chaals> ... What are the useful things to add? There are different things different people will want 13:37:03 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:37:15 <chaals> ... Looking at HTML5 compatibility, and things like metadata in CMS content for the 'deep web'. 13:37:29 <chaals> ... Don't want to invent new stuff where we can use things that already exist. 13:37:42 <chaals> [slide - Candidate Stakeholders] 13:38:10 <chaals> ... Main message: we need to look at the whole stretch from production to consumption. 13:39:11 <chaals> ... There are lots of players, and different ways of building the workflows. 13:39:38 <chaals> ... We want to find real requirements - problems that people actually have 13:39:46 <chaals> [slide - Scope of Use Cases] 13:40:16 <chaals> [slide - Source Content Processing] 13:40:40 <chaals> [slide - Localisation Quality Assurance] 13:41:14 <chaals> ... different approaches possible, and we need to think about e.g. what simple authors are doing, and how to work with people who have strong systems that need to integrate with e.g. XLIFF 13:41:36 <chaals> [slide - CMS-L10N integration via RDF and XLIFF] 13:41:59 <chaals> ... Exploring ways of working with formal systems for tracking the process 13:42:12 <chaals> [slide - Leverage Target Quality Metadata] 13:42:40 <chaals> ... There are some things that flow through the process, some things that are important for particular steps. 13:42:50 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:42:53 <chaals> [slide - Rich Metadata for translation] 13:43:00 <chaals> [slide - Next Steps] 13:43:32 <chaals> ... We're working in public, and we hope to get involvement as well as being transparent about what we are doing. 13:44:00 <chaals> ... Will hold a workshop in Dublin 11-12 June, getting close to finalising requirements 13:44:41 <chaals> ... And then there are more things to work on beyond the scope of this project - multimedia, javascript, etc 13:45:20 <chaals> Topic: Questions for Localisers. 13:45:40 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:45:59 <chaals> Reinhart: We wanted to be able to share translations and let communities rate and review them. 13:46:38 <chaals> MH: We were thinking of this, taking inspiration from Universal Subtitles that allows people to help provide video subtitles. Nothing to show yet though 13:47:53 <chaals> Dave?: In architecture of MTU you had what looks like an API between various MT engines. We're looking at something similar in Adobe. Are you going to make those interfaces public, and are you interested in standardising the approaches? 13:48:12 <fsasaki> s/Dave/Des Oates (Adobe)/ 13:48:17 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:48:34 <chaals> SD: We're taking solutions supported by our institutional IT department. We're developing on the basis of commercial systems, bulding it to allow implementing rules for different types of request. 13:49:07 <chaals> ... if you have multiple MT engines for a given language, you call one or another based on e.g. the domain. But it is purely internal. 13:49:39 <chaals> ... This is something that is available, that has been customised for each client. I don't see interest in making the custom configuration standard. 13:50:15 <chaals> Lloyd: What kind of effort do you have in source quality in MWLT? 13:50:36 <fsasaki> s/MWLT/machine translation/ 13:51:00 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:51:00 <chaals> SD: We are aware of the importance of quality. We have no way to impose rules on the sources. 13:51:43 <chaals> ... many users are drafting things that are not in their native languages. We have editing units to help, we are considering using authoring support, but in practical terms this looks extremely expensive to provide. 13:52:02 <chaals> ... we're very early in this process. 13:52:25 <chaals> DL: In ML-LT the question hasn't come up yet. I think it is an interesting use case. 13:52:47 <chaals> ??: Coud you expand on the policy for open source? 13:53:36 <chaals> SD: Interesting question. There is a change in policy since december - now commission documents are by default made available for everything, unless there is a clear justification for restricting access. 13:53:48 <chaals> ... There is a new open data initiative starting in line with this trend. 13:53:56 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:55:01 <chaals> Anonymous: MH, does the system give translation memory, how are translations reported back and integrated online, and can it be linked ot other automatic translation services? 13:55:44 <chaals> MH: Right now it uses translation memory from our own localisation work. 13:56:25 <chaals> ... Linking to other machine translation services is possible - we switched already to the Microsoft service (although we only have that at the moment it is easy to switch) 13:57:04 <chaals> ... Integrating to the services. Pontoon can detect every text node, and you translate a page, or using getText to do localisation. 13:57:21 <chaals> ... so we create hooks for getText and use them to create metafiles. 13:57:51 <chaals> Topic: Tomas' demonstration of chaals swaplang extension 13:58:06 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 13:58:38 <chaals> https://addons.opera.com/addons/extensions/details/swaplang -> extension that identifies pages which point to alternative languages, so users can select them. 13:59:01 <chaals> [It's open source - feel free to adapt, improve or port it] 13:59:16 <chaals> ??: Do you have community participation? 13:59:41 <chaals> MH: Facebook has a similar tool. And I hear stories that there are fights in communities about whose translation should win. 13:59:49 <chaals> s/have/have problems with/ 14:00:08 <chaals> ... we don't use pontoon with live sites yet. We could limit access, etc. 14:00:23 <chaals> ... but we want everyone to participate. Need to consider how to handle this. 14:01:03 <chaals> ???: Yes, this happens. At the end of the day we have to decide on who we accept - choose an authority, and then try to merge differences. 14:01:22 <chaals> Felix: MW-LT is on a very tight schedule. 14:01:44 <chaals> ... please tell us soon what you do and need. 14:01:58 <chaals> [applause for the whole panel] 14:02:53 <RyanHeart> The Machines panel is about to start, chaired by Felix. 14:03:11 <chaals> scribeNick: RyanHeart 14:03:31 <RyanHeart> Peter Schmitz of the EU Publications Office is reporting on Common Access to EU Information based on semantic technology. 14:03:41 <RyanHeart> trying:) 14:03:46 <chaals> s/Peter/Topic: Peter 14:03:52 <chaals> s/trying:)// 14:04:12 <RyanHeart> Mission: From the EU to the public. 14:04:23 <chaals> s/Mission/PS: Mission 14:04:36 <fsasaki> s/what you do and need/what you do and need and fill in the questionnaire at https://www.w3.org/2002/09/wbs/1/mlw-lt-requirements/ 14:04:37 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 14:04:46 <RyanHeart> Production of publications and preparation of publications in all EU languages. 14:05:01 <RyanHeart> Different types of publications: Official and non-official. 14:05:23 <chaals> s/Production o/... Production o 14:05:28 <RyanHeart> Official journal: 866 issues, 22/23 languages with > 1m pages. 14:05:33 <chaals> s/Different types/... Different types 14:05:40 <chaals> s/Official/... Offical 14:05:47 <RyanHeart> Consolidation of EU law is another area of work. 14:06:14 <RyanHeart> Different online services are also provided: ear-lex (law), bookshop, etc. 14:06:45 <chaals> s/Consolidation/... Consolidation 14:06:52 <chaals> s/Different/ ...Different 14:06:59 <RyanHeart> The idea behind the CELLAR project is to create one single repository for all metadata. 14:08:04 <RyanHeart> Peter illustrates the structure of the CELLAR project with the target architecture consisting of a portal, index and search, content and metadata, post production and production layers. 14:08:51 <RyanHeart> Peter highlights the dual nature of the repository in CELLAR, covering both content and metadata. 14:09:23 <RyanHeart> The system has passed its development stage, according to Peter, and is now deployed. 14:10:06 <RyanHeart> Another common portal is being developed, outlines Peter, to provide a better and easier-to-use interface to CELLAR. 14:10:55 <RyanHeart> The CELLAR project uses a common data model, an ontology based on GRBR model. 14:12:01 <RyanHeart> Peter explains that the CELLAR project uses RDF and taxonomies represented in SKOS. 14:12:11 <fsasaki> s/GRBR/FRBR/ 14:12:21 <fsasaki> FRBR = Functional Requirements for Bibliographic Records 14:13:32 <RyanHeart> Coded metadata supports the delivery of multi-lingual content, explains Peter,... 14:13:54 <RyanHeart> … which is also used to index the content. 14:15:49 <RyanHeart> Interoperability is achieved by adopting standards as much as possible, such as METS (metadata encoding and transmission standard), Dublin Core, GRBR, Linked Open Data (LOD) and Standard Query Language (SPARQL), according to Peter. 14:16:21 <RyanHeart> At the same time, the EC also contributes to the development and definition of standards, says Peter... 14:17:36 <RyanHeart> … including around core metadata (to enable global reach), using common authority tables (to harmonize metadata), and driving an exchange protocol for EU legislative procedures. 14:18:10 <RyanHeart> The European Legislative Identifier (ELI) is under preparation, says Pater. 14:18:30 <RyanHeart> s/Pater/ …Peter 14:19:01 <RyanHeart> Next speaker is Paul Buitelaar of DERI at the National University of Ireland, Galway. 14:19:27 <RyanHeart> Paul is presenting on the Ontology Lexicalisation and Localisation for the Multilingual Semantic Web 14:19:48 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 14:20:00 <RyanHeart> It's about accessing business information across languages, says Peter. 14:21:18 <RyanHeart> SAP is a partner in the project building a business analysis tool based on the DERI approach, according to Peter... 14:21:47 <RyanHeart> … showing an example of how the system, called Monnet, is working. 14:22:18 <RyanHeart> Ontologies cannot be directly translated, says Peter, who describes how a lexicon is used for translation. 14:23:04 <fsasaki> s/says Peter/says paul/ 14:23:42 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 14:23:57 <RyanHeart> The research objectives of Monnet, says Paul, are around the development and use of multilingual ontolofies and the exploitation of domain semantics to improve MT. 14:24:56 <RyanHeart> Peter, the financial use case for the Monnet project is 'Harmonizing Business Registration across Europe' using XBRL and xEBR. 14:26:31 <fsasaki> s/Peter,/Paul:/ 14:26:49 <RyanHeart> Peter, the methods used for domain training of term translation include hybrid methods, including domain lexicon generation from wikipedia & domain parallel corpora, LDA topic modeling with features mixed-in from the ontology etc. 14:27:02 <RyanHeart> s/Peter/ Paul 14:27:22 <monica> s/ontolofies/ontologies 14:29:06 <RyanHeart> Paul, another use case is that of public services in The Netherlands, presenting different requirements and complex semantics. 14:30:10 <RyanHeart> Paul, GELATO (Generation of LAnguage and Text from Ontologies) is one of the methodologies used. 14:30:49 <RyanHeart> Paul, Ontology Lexicalisation is one of the central topics in the Monnet project. 14:31:40 <RyanHeart> Paul, there are a number of different use cases in this area in Ontology Localisation, Ontology-based information extraction etc. 14:32:26 <RyanHeart> Paul, The project is working with the W3C Ontology-Lexicon Community Group and has proposed its own 'Monnet' format. 14:33:29 <RyanHeart> Felix, The final speaker in this session is Tadej Stajner from the Jozef Stefan Institute talking about Cross-lingual named entity disambiguation for concept localization. 14:33:56 <fsasaki> s/Felix,/Felix:/ 14:34:10 <RyanHeart> Tadej, Translating proper names is a big problem for statistical MT systems, one that cannot be solved by the HTML5 translate attribute. 14:34:55 <RyanHeart> Tadej: Depending on the source and target languages, there are different rules for the translation of proper names. 14:35:09 <fsasaki> topic: presentation from Tadej Štajner 14:35:29 <RyanHeart> Tadej: One solution for this problem is to check whether a translation for an entity already exists. 14:36:10 <RyanHeart> Tadej: The information presented in a document is checked against a knowledge base and disambiguated. 14:36:51 <RyanHeart> Tadej: The knowledge base contains labels and entities. 14:37:51 <RyanHeart> Tadej: This requires a good coverage of entities in the knowledge base (kb) and works better in more widely used languages. 14:38:22 <RyanHeart> Tadej: A solution for languages without a wide coverage would be to use a kb that is in a different language from that of the document. 14:38:58 <RyanHeart> Tadej: There are a number of different ranking features that could be used, including popularity and context similarity. 14:40:17 <RyanHeart> Tadej: For example, if Kashmir was used close to Led Zeppelin, it would be obvious that the song rather than the country was referred to. 14:41:11 <RyanHeart> Tadej: Cross-lingual gathering of candidate entities only works for proper names and only if they are not translated to local languages. 14:41:58 <RyanHeart> Tadej: Context similarity works in a vector space, treating the distinct worlds as dimensions. This does not work across languages. 14:42:27 <RyanHeart> Tadej: The solution is to not compute similarity but to map texts. 14:43:27 <RyanHeart> Tadej: This can be achieved by training parallel corpora with Canonical Correlation Analysis (CLIR) techniques. This has been implemented for EuroParl. 14:44:48 <RyanHeart> Tadej: Future work proposed includes that of the FP7 project XLike and the standardization work in the W3C Multilingual Web - LT Working Group. 14:45:46 <RyanHeart> Tadej: The annotations can be used in HTML and are transparent for normal CMS operations and web browser rendering. 14:46:35 <RyanHeart> Tadej: I am now going to do a demo RDF a Lite, enrycher.ijs.sl 14:48:19 <RyanHeart> Applause 14:48:49 <RyanHeart> Felix: Thank you, Tadej. Any questions? 14:49:32 <RyanHeart> Ivan: Question about CELLAR project. You create a silo, but to you produce links to other data sets, such as government data? 14:50:43 <RyanHeart> Peter: You are right. We are aware of this and would, indeed, be interested in linking up with other similar public data repositories. 14:51:32 <RyanHeart> Joerg: Peter, is there any established interaction with DG Translation, as you share a lot of architectural and data management issues. 14:51:50 <RyanHeart> Peter: What is your organization? Ah, Bioloom. 14:52:09 <RyanHeart> Peter: DG Translation is one of our customers, in a sense. 14:52:45 <RyanHeart> ?: A question for Paul. Domain lexicon generation from Wikipedia - how did you do it? 14:53:31 <RyanHeart> Paul: we looked at the terms to be translated and extracted them. Then went to the domain-specific Wikipedia entries and to other languages and retrieved the translations. 14:54:50 <RyanHeart> olaf: A question for Tadej. In relation to name disambiguation- what have you done in relation to cities that exist in different countries, such as Vienna or Wien? 14:55:08 <RyanHeart> Tadej: We look at the context. 14:55:36 <RyanHeart> Tadej: Therefore, Vienna in the USA would not be confused with Wien in Austria. 14:56:36 <RyanHeart> Christian: A question for Paul and Tadej - you first identify language-neutral entities; then you do not use MT, but what do you use? 14:56:48 <RyanHeart> Paul: we actually do MT. 14:57:43 <RyanHeart> Tadej: There are people approaching the same problem using MT, and it works reasonably well. 14:58:13 <RyanHeart> Tadej: But my point is that we do not have to use MT, that we can use a cheaper approach and achieve very similar results. 14:58:37 <Jirka> Jirka has left #mlw 14:58:41 <RyanHeart> Felix: Let me thank again all the speakers. Please be back at 16:30 for our next session. 15:09:26 <Arle> Arle has joined #mlw 15:09:58 <Arle> scribe: Arle 15:29:25 <Arle> topic: Presentation by Annette Marino 15:32:29 <Arle> scribeNick: Arle 15:32:54 <Arle> Annette: Will discuss web communication and its importance for citizens 15:33:33 <Arle> ... We are lucky to live in Democratic societies, but we should not take it for granted. Many do not enjoy freedom. 15:33:40 <RyanHeart> Great to hear about 'citizens' rather than 'customers'. 15:34:29 <Arle> ... Choosing leaders is not enough. Citizens need to participate. The internet provides a way for citizens to interact with leaders. For the EU the importance of good web communication cannot be overestimated. 15:34:45 <Arle> ... But how can we communicate with citizens if we don't speak their language? 15:35:31 <Arle> ... Fortunately for the EC, we have specialized web translation service in the DGT that helps with communication and assists in redesigning websites to assist citizens. 15:35:50 <Arle> ... We don't just translate, but also localize the whole message with the target country in mind. 15:36:14 <Arle> ... Our team has small, autonomous teams for each language. The lines between planners and translators are short to increase participation. 15:37:14 <RyanHeart> A 'human' translator speaking. A first, after close to two years :) 15:37:15 <Arle> ... (Dutch translator who is speaking is not on list of speakers.) Human translators are underrepresented in this discussion. [Asks for show of hands about different audience profiles.] 15:37:41 <Arle> ... Want to discuss what we do as translators. We try to get people to consider multilingual needs from the start, to keep it in the back of the mind at all times. 15:38:00 <Arle> ... That's why we fight to keep content short and simple, think about consequences in other language versions. 15:38:13 <Arle> ... Keeping things short and simple in the Commission can be difficult. 15:38:36 <Arle> ... We face the challenge of matching formats with our tools. We lag a bit, but our web masters keep wanting to add new tools. 15:38:53 <Arle> ... Tools are improving, but it is often a challenge for translators to know what to translate and what not to translate. 15:39:02 <Arle> ... There is a steep learning curve. 15:39:13 <Arle> ... [Back to Annette] 15:39:31 <Arle> ... Since we cannot translate everything, we have to choose priorities carefully. We focus on top-level pages and navigation. 15:39:57 <Arle> ... For specialist/niche pages, MT may do, but for information going to a large audience, multilingual and user friendly in the local style are required. 15:40:08 <Arle> ... The bigger the audience the higher the profile. 15:40:24 <Arle> ... We need to understand how citizens use the web and social media to help make the best decisions. 15:41:01 <Arle> ... Quality assurance is our goal. We have to check closely. This requires close collaboration with web teams. QA work is time-consuming and expensive, but hard to quantify. 15:41:12 <Arle> ... [Back to Dutch translator] 15:42:04 <Arle> ... Now I want to share some examples of what we do. We have huge volumes of legislation, but you will not read laws in EuroLexis, so we have a portal with short, concise information that covers practical needs for citizens. 15:42:32 <giuseppe> giuseppe has joined #mlw 15:42:49 <Arle> ... We try to put out as much national information from authorities as possible to make this a one-stop shopping site for information where citizens can find it all. 15:43:34 <Arle> ... This is tricky: 27 languages from 23 countries. If there are too many languages on a page, you can't use it. What would happen if you found Maltese when you need another language? Some human intervention is essential. 15:44:19 <Arle> ... Another example: website on legislation that allows you to propose citizens’ initiatives: if you get 1,000,000 signatures, the EU is obliged to propose a law. 15:44:34 <Arle> ... We cannot use MT for this since it could invalidate efforts. 15:45:13 <Arle> ... Last example is the Commission home page: we try to translate as much as possible. 15:45:58 <Arle> ... We do not just translate: we localize. For example, if a Portuguese museum wins an award, we might not translate it for a Dutch user, but instead put some local content. 15:46:08 <Arle> ... [Back to Annette] 15:46:39 <Arle> ... We deploy our multilingual expertise in service of citizens. We are translators, but first and foremost communicators in service of citizens. 15:46:47 <Arle> topic: Presentation by Murhaf Hossari 15:46:48 <fsasaki> fsasaki has joined #mlw 15:47:15 <Arle> Murhaf Hossari: I work for Apple on localization and did studies in Dublin. 15:47:55 <Arle> ... I will talk about why right-to-left (RTL) languages importance and best practice. 15:48:39 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 15:48:45 <Arle> ... To start with, I want to talk about a friend who wanted to do software business in the Middle East. [Shows examples of promotions that don't work because cartoon shows solution messing things up] 15:49:04 <chaals> chaals has joined #mlw 15:49:06 <Arle> ... The whole flow, right-to-left, means that the whole screen flow needs to be reversed. 15:49:31 <Arle> ... [Shows screen shot with UI mirroring from OS X Lion] 15:49:53 <fsasaki> scribe: Arle 15:50:00 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 15:50:26 <Arle> Murhaf: To make a site compatible for Hebrew and Arabic, everything must be adjusted 15:50:53 <Arle> Murhaf: Everything must be right-aligned. 15:51:42 <Arle> ... You need directionality support for text. [Shows example in Roman characters of RTL, LTR, and bidirectional text] 15:52:31 <Arle> ... The Unicode bidirectional algorithm (UBA) can handle display. Text can be entered the same way (first character first), but is displayed properly. 15:53:13 <Arle> ... The algorithm reorders the characters in the way the user would expect based on the language. 15:53:54 <Arle> ... It has a set of rules to try to change the order from the input string to what the user expects. [Shows example of reordering rules] 15:56:32 <Arle> ... UBA does a good job in most cases. But there are a few cases where it does not. E.g., if the paragraph direction is not detected correctly based on first character; if strings with different directionality are nested in difficult ways; if strings contain numbers, names, etc.; strings that are ambiguous for humans as well. 15:56:54 <Arle> ... If we can improve the difficult cases, it would be a great goal. 15:57:37 <Arle> ... [Shows example in which “Apple” is the first word in an Arabic string, which sets Left-to-Right as direction, but it should be RTL.] 15:58:27 <Arle> ... [Example in which “Yahoo!” is separated from the ! because of ordering; also one in which file extension is in the wrong area.] 15:58:46 <Arle> ... [Shows example in which parentheses end up in the wrong place] 15:59:25 <Arle> ... Right now you can use extra markup, tags, Unicode control characters to force behavior, but this is manual action and based on experience. 16:00:03 <Arle> ... The problem with manually adding them is that the translator may now know what to do and they are not easy to use since they require knowledge about the UBA. The are invisible, which means they may be lost, breaking the string. 16:00:14 <Arle> ... Sometimes there is no way to check until runtime. 16:00:53 <Arle> ... UBA needs to be improved based on studying cases where problems occur. We should find patterns and then parse strings to improve behavior. 16:01:22 <Arle> ... Numbers are difficult. People think they do not change, so they may hard code them, but Hindi numbers are used in Arabic, for instance. 16:02:11 <Arle> ... Best practices include: site must support RTL, avoid composed strings, avoid weak and neutral characters that cause UAB problems, don't enforce direction, support localized numbers, support multiple locales. 16:02:28 <Arle> In Tunisia they use Western numbers. Other places use others. 16:02:57 <Arle> s/In Tunisia/... In Tunisia/ 16:03:05 <Arle> topic: Presentation by Nicoletta Calzolari 16:03:25 <Arle> Nicoletta: I will speak on the Multilingual Language Library. 16:03:39 <Arle> ... It is at the heart of the Multilingual Web 16:04:02 <Arle> ... The motto is “Let’s build it together!” Community involvement is critical. 16:04:42 <Arle> We want to make more use of the trend for sharing. Part of META-Share for resources. It is a big step, but not enough. We need to move to collaborative resources. 16:04:54 <Arle> s/We want/... We want/ 16:05:11 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 16:05:15 <Arle> ... Interoperability gains priority in this scenario. 16:06:02 <Arle> ... NLP is data intensive. Annotation is at the core of training, testing, etc. But our community efforts are scattered and dispersed, with insufficient opportunity to exploit results. 16:06:47 <Arle> ... We want MANY (parallel?) tests/data for MANY languages. We want to support all possible types of processing and annotation we may be able to produce as language technology people. 16:07:07 <Arle> ... For example, annotation about time, space, etc. 16:07:19 <Arle> ... It is a step toward making our discipline more like mature sciences. 16:07:52 <Arle> ... Those disciplines have thousands of people working together on the same experiences. We aren't there yet, but to be mature, we must be able to do this if we are to make a step forward. 16:08:21 <Arle> ... Accumulation of massive amounts of multidimensional data is the key to foster advancement in our knowledge about language and its mechanisms. 16:08:41 <Arle> ... We do not want isolated resources. They need to bound together and their relationships examined. 16:09:18 <Arle> ... We want to create an infrastructure for a large language repository to accumulate all knowledge about a language and encourage analysis of interrelations. 16:09:32 <Arle> ... We cannot currently share this knowledge. 16:09:51 <Arle> ... The challenges are not technical or at the design level. They are at the organizational level, in community involvement. 16:10:46 <Arle> ... We are starting with the LREC Repository that hosts a number of parallel/compatible resources in as many languages as possible, focusing on multiple modalities (speech, text, etc.) 16:10:57 <Arle> ... This will be contributed to META-SHARE. 16:11:12 <Arle> ... Authors are invited to process data in languages they know. 16:11:26 <Arle> ... They are invited to focus on different sorts of processing/tagging that they know. 16:11:38 <Arle> ... Processed data is shared back with the project. 16:12:05 <Arle> ... We currently offer data in 64 languages. English has the most, followed by Spanish and Catalan. 16:12:12 <Arle> ... There are many missing languages. 16:12:30 <Arle> ... [Shows table of annotation types.] 16:12:46 <Arle> ... [Shows table to tools used for annotation] 16:13:27 <Arle> [Shows table of standard formats. Heavy use of TIMEX3 for temporal data markup.] 16:13:40 <Arle> s/[Shows/...[Shows/ 16:13:55 <Arle> ... All data will be available publicly. 16:14:16 <Arle> ... This is our first experiment. We hope it will set the ground for a large language library. 16:14:40 <Arle> ... It will help us build all knowledge and let us build on each other’s achievements. It requires a change of mentality. 16:15:03 <Arle> ... We need to focus on collaborative mindset. 16:15:19 <Arle> ... Interoperability issues are a problem since we do not require conformance to any standard. 16:15:38 <Arle> ... Please contribute at http://languagelibrary.eu 16:15:48 <Arle> topic: Presentation by Fernando Serván 16:16:37 <Arle> Fernando: I will start with context about the Food and Agriculture Organization of the United Nations. 16:16:48 <Arle> ... We have over 190 member countries. 16:17:07 <Arle> ... Focus on aspects of agriculture, food standards, animal diseases, etc. 16:17:22 <Arle> ... 5 regional offices. Work in a number of languages. 16:17:41 <Arle> ... See www.FAO.org as portal. But we are working in Facebook, Twitter, etc. now. 16:18:07 <Arle> ... [Shows table of users by browser language] 16:18:24 <Arle> ... English is dominant (53%) but other languages are growing. 16:19:15 <Arle> ... Our issues with language call for use of MT. We produce governing bodies’ statutory documents, food standards, news and campaigns, technical information, internal communication. 16:19:35 <Arle> ... We need to make this available in all languages, but our budget is small. 16:20:46 <Arle> ... We use human translation for governing bodies’ documents, and normative documents. We may use MT + post-editing for the other groups, but we want to get to the point where we do not need human intervention. 16:21:32 <Arle> ... We have been testing MT (Moses). We want to reuse legacy translations. We want to integrate TM and MT and use our knowledge and experience to improve the production of multilingual content for the web. 16:21:47 <Arle> ... We want to improve the efficiency of the translation process. 16:22:12 <Arle> ... Not all content can be translated by humans in all languages; we need to accelerate the process, particularly for legal documents. 16:22:53 <Arle> ... We started with allowing users to send queries to the engine and provide translated responses. By monitoring the requests, we would get a better view of what content is demanded. 16:23:05 <Arle> ... This knowledge would help us focus our resources. 16:23:22 <Arle> ... Started with Spanish (for expertise) and Arabic (critical demand) 16:23:38 <Arle> ... [Shows slide of architecture] 16:23:47 <Arle> ... We used TBX, TMX, etc. to use standard formats. 16:24:14 <Arle> ... However, after trying this, we found out that the best format to fit SMT is plain text (.txt) aligned in a certain way. 16:24:32 <Arle> ... We were moving from rich formats to non-rich formats. 16:25:03 <Arle> ... The engine requires that the text be cleaned up from markup. It reduces the information in the available translations for use by the engine. 16:25:56 <Arle> ... Some issues we have found are: (1) there is little information about MOSES for mere mortals; (2) best practices are not documented in the UN network of practitioners. 16:26:08 <Arle> ... We have shared experience in JIAMCATT. 16:26:18 <Arle> ... We found that there are common problems. 16:27:01 <Arle> ... Other issues: there are standards for each part of the process, but they do not integrate with each other, raising interoperability problems. They do not work well together. 16:27:21 <Arle> ... For us, the translate attribute is useful, but what do we do when we have to convert to plain text? 16:27:40 <Arle> ... directionality is a problem, as are numbers, acronyms. (E.g., in Arabic, acronyms are not used.) 16:28:25 <Arle> ... For our texts, English is the source language, but it is written by those for whom English is not a native language. Thus the source is “UN English” but the translations are in native languages. It can create quality problems. 16:28:53 <Arle> We are watching the MultilingualWeb-LT project and hope it will help us bring more content to more languages. 16:29:00 <Arle> topic: Questions and Answers 16:29:28 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 16:29:51 <Arle> Gerard Meijssen: For Nicoletta, is the information in the repository freely licensed? 16:30:09 <lbellido> lbellido has left #mlw 16:30:15 <Arle> Nicoletta: It is not a repository of translations, but of language resources. 16:30:35 <Arle> ... MT results could be one resource that could be contributed. 16:31:01 <Arle> Gerard: The data is in the LREC repository, but it is available under a free license where you can do anything with it? 16:31:53 <Arle> Nicoletta: The are available for everyone, but if you process the data and voluntarily contribute your processing back, you have to make it available. You can set licensing the ensures availability. 16:33:02 <Arle> Tomas Carrasco: MOSES for Mere Mortals is from a member of our team. Keep your data, but use open formats. Legal issues can be difficult, but instead we should focus on agreeing on formats so we can share as needed. Sharing data is not enough. 16:33:13 <lbellido> lbellido has joined #mlw 16:34:06 <Arle> Nicoletta: Let me clarify. We provide data and we ask users to process the data (add annotations). It is all through the META-SHARE platform. The reason is we want to see the results and analyze what we get. We do not ask for a specific format at this point because that is a top-down approach but we want to see what the community does on its own. 16:34:21 <Arle> ... We know that best practices, standards will emerge. It's a different approach. 16:34:48 <Arle> Daniel Garcia: For Annette, are you involved with translation of social media. 16:34:54 <Arle> Annette: No. 16:35:40 <Arle> ??: The LREC initiative is great, but have you considered the issue of the quality of the data you are getting? I assume the collection should be reused, but if you don't know the quality, there is not much use of it. 16:36:11 <Arle> Nicoletta: That is part of the experiment. We need to analyze the data for quality so we can understand the issues that will rise on a bigger scale. 16:36:26 <fsasaki> s/??:/danT: 16:36:28 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 16:36:48 <Arle> ... One possible way it may go is that when you have many layers of annotation, if there is many groups you can look at the issue in many ways. 16:38:10 <Arle> ??: For Fernando. How do your users cope with MT quality? Are metadata from databases (e.g., descriptions, keywords, etc.) translated to provide accessibility even for non-translated materials so that users can know about the availability of data. 16:38:58 <Arle> Fernando: We use MT only internally for the time being. The results do not go beyond our intranet. Quality is an issue, and because people are used to human translation, we don't want to expose ourselves to risk until we know the results. 16:39:34 <Arle> ... For document production, we translate titles, etc. Much uses controlled vocabulary. We use controlled syntax for URLs. 16:39:47 <Arle> ... We use only English metadata at present in the CMS. 16:40:15 <Arle> Jörg Schütz: Does SKOS play a role in your efforts? 16:40:20 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 16:40:29 <Arle> Fernando: We use terminology database, other resources. 16:40:59 <Arle> Jörg: For Annette, you mentioned the notion of a default language. How do you decide what it is? 16:41:10 <Arle> ... Is the fall-back always English? 16:41:18 <Arle> ??: Generally yes. 16:41:24 <Arle> Jörg: That matches my experience. 16:41:41 <Arle> [Applause for speakers] 16:42:46 <lbellido> lbellido has left #mlw 16:43:44 <Arle> Richard: Provides information on reception at Parc Bellevue. If you want to take the bus, take the #18, #12 and go to the Homelius stop. Go further down the road in the same direction. Take the second on the right to Ave. Marie-Theresie. The room is the Salle Marie-Theresie. 16:44:16 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 16:45:02 <RRSAgent> I have made the request to generate http://www.w3.org/2012/03/15-mlw-minutes.html fsasaki 16:45:17 <Dominic-Jones> Dominic-Jones has left #mlw 16:48:56 <AndroUser2> AndroUser2 has joined #mlw 17:35:43 <lbellido> lbellido has joined #mlw 17:36:56 <lbellido> lbellido has left #mlw 18:01:47 <gischy> gischy has joined #mlw