07:12:59 RRSAgent has joined #mlwDub 07:12:59 logging to http://www.w3.org/2012/06/11-mlwDub-irc 07:14:18 NSR has joined #mlwDub 07:25:05 meeting: MultilingualWeb workshop 07:25:09 chair: DaveLewis 07:25:14 scribe: various 07:25:19 agenda: http://www.multilingualweb.eu/documents/dublin-workshop/dublin-program 07:25:43 present: manyPeople, many_people, many-people 07:25:58 topic: welcome 07:26:12 session to start 9 a.m. 07:26:36 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 07:34:46 Arle has joined #mlwDub 07:35:51 Yves_ has joined #mlwDub 07:36:30 leroy has joined #mlwDub 07:53:28 philr has joined #mlwdub 07:54:44 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 07:57:50 DGroves has joined #mlwDub 08:02:23 davidorban has joined #mlwDub 08:02:31 welcome by dave lewis 08:02:54 BryanS has joined #mlwDub 08:02:56 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 08:03:11 omstefanov has joined #mlwDub 08:03:43 introduction by vincent wade 08:03:44 David Lewis opens conference ! 08:04:03 vincent: welcome to dublin and tcd, delighted to host this workshop 08:04:13 .. mlw and linking of data across mlw is key to expansion of web 08:04:23 labra has joined #mlwdub 08:04:37 .. in CNGL, we are looking into a value chain from creation to delivery 08:04:44 .. how mlw content can be integrated 08:05:01 .. technology on language and multimedia content, personalization 08:05:10 .. etc. need to be brought together 08:05:30 .. happy to see so many cngl partners here, collaborating nationaly 08:05:50 .. science foundation ireland has invested a lot into CNGL, DERI focusing into SW 08:06:09 .. in FP7 and collaborations across the world, we more and more have these roadmap meetings 08:06:28 .. we are looking into similar problems, so we need to find roadmaps to work together 08:06:33 .. multilinguality is a key part of this 08:06:57 now introduction by richard ishida 08:06:57 maximizing impact of our efforts <- key final point of Prof. Wade's talk! very important ! 08:07:10 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 08:07:26 richard: 5th of mlw workshop, very happy to see it taking place here 08:07:40 .. we run the MLW project with help of EC as a project, 08:08:11 .. idea was to bring people from different disciplines together, so that they talk, it worked very well 08:08:34 Jirka has joined #mlwDub 08:08:47 .. during this workshop we will be more focused, but later we will go back to the general MLW workshop type again 08:09:31 .. 12 years ago Yves Savourel and I started talking about internationalization and localization of schemas, that led to ITS standard; great to see where we came so far 08:09:40 .. thanks a lot and have a good meeting 08:09:45 intro by dave lewis 08:09:55 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 08:10:06 dave: multilingualism and web content is crucial for many businesses: 08:10:20 .. media providers, richer video and audio, content providers like microsoft 08:10:25 David Lewis: success of mlw workshops: getting people together who might not otherwise have met and worked together. 08:10:25 .. CMS providers, browsers, ... 08:10:39 .. all need to be aware increasingly of multilingual content 08:10:57 .. aim to get people into the room from industry and academy, and different parts of the topic 08:11:25 .. language technology, localization, web people, with w3c as a core place where people meet 08:11:31 .. and where we advance things to standards 08:11:41 .. another key player from a European perspective is the EC 08:11:58 David Lewis: core of multilingual issues is W3C ... to advance standards. Other core player is the EC which provides ongoing support. 08:11:59 .. coordination activities have an important role 08:12:12 .. important both for research and infrastructure support 08:12:24 .. now fp7 is ending, looking forward to horizion 2020 08:12:37 .. has many opportunities to bring things together 08:12:44 EC's Framework 7 will be followed by Horizon 2020 (not next Framework) 08:13:24 dave: today please thing mostly about bridge building: between various disciplines, industry and research 08:13:50 .. and esp. the two themes: MLW (HTML5, going into language services industry) and linked (open) data 08:14:18 .. at the end of the day we have a couple of questions to lay out a roadmap, so keep these questions in mind 08:14:22 intro from kimmo 08:14:26 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 08:14:47 kimmo: project officer of the series of workshops - MLW project and MLW-LT project 08:15:22 .. very grateful to Richard for bringing this community to where we are, 08:15:36 .. a few words about our internal re-organization 08:15:52 .. "my" dg will no be called "dg connect" 08:16:18 .. in three weeks three units will be merged, the "data value chain" unit 08:16:41 .. there units were LT, data, and PSI (previously E 1,2,4) 08:16:56 .. LT portfolio will continue to exist, but in a bigger context 08:17:32 .. E.4 does not have projects; so the new G3 unit will handle the E1 and E2 projects 08:17:50 .. new activity of our unit: we will handle policy and legislative issues of public data 08:17:58 .. so-called PSI "open public governmental data" 08:18:18 .. also, our unit will handle two infrastructures of connecting europe facility 08:18:42 Yves_ has joined #mlwDub 08:18:53 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 08:19:19 kimmo: LT community needs to see how they can leverage linked data, SW, big data ... 08:19:30 .. these are just keywords, here are just some threads: 08:19:45 .. extracting meaning from text, converting structured data from unstructured data 08:19:57 .. hard task, but can move forward by bringing LT and data community 08:20:27 .. a lot of useful work on terminology, ontologies, taxonomies, nomenclatures etc. 08:20:55 .. very happy that this workshop opens with the colorful speakers related to that area 08:21:14 .. a few more words on CEF - is about infrastructure 08:21:24 dF has joined #mlwDub 08:22:08 .. CEF concludes early designs for 8 infrastructures, e.g. Europeana, multilingual access etc. 08:22:16 .. this is not research, but building systems 08:23:35 about 78 meur will be in 3 calls to be published in July 2012 08:24:27 Obj. 4.1 (27meur), 4.2 (31 meur) and 4.3 (20 meur). 08:24:29 kimmo: objectives - content analytics, and LT, scalable data analytics, SME initiatives on analytics 08:24:37 4.1 COntent analytics and lang tech 08:24:46 4.2 scalable data analytics 08:24:56 4.3 SME initiative on analytics 08:25:13 .. our role is about extracting meaning from large types of language based information 08:25:22 s/our/our (LT)/ 08:25:58 kimmo: I'm here the whole day, if you have questions please let me know 08:26:06 Kimmo Rossi only here today. Invites everyone to come to see him today if interested / have ideas 08:26:09 dave - now short self introduction 08:26:34 mlefranc has joined #mlwDub 08:26:42 nico has joined #mlwDub 08:32:38 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 08:33:30 dave: very interesting mix of people here 08:33:44 .. very happy to have also XLIFF TC people here, who will have a meeting here later int the week 08:33:54 .. so we have good expertise from OASIS on the localization side too 08:34:26 tadej has joined #mlwDub 08:35:03 dave checking - who is from industry, research, standardization 08:35:20 when - who is LT or SW research side 08:35:45 dave: have done a good job in bringing the people here 08:37:17 arle: look at IRC - this is where we make the meeting minutes, but also to gather comments 08:38:12 paulb has joined #mlwDub 08:38:14 arle: crowdsourcing content creation - go and write your ideas on the board, people leading the workout sessions will make use of that 08:38:20 ryan has joined #mlwDub 08:38:22 Chris has joined #mlwDub 08:38:38 ryan has left #mlwDub 08:39:53 RyanHeart has joined #mlwDub 08:40:14 topic: setting the stage - presentation from david urban 08:40:51 s/david urban/david orban/ 08:41:12 Rob has joined #mlwDub 08:41:36 davidUrban: exponential trends: 08:41:51 David Orban (dotSUB): challenge is understanding the power of exponential trends 08:42:08 .. in the initial part of the exponential function, people say easily "this is just noise" 08:42:20 .. famous example is human genome project 08:42:59 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 08:43:59 davidU: I'm a geek, loving to observe the nature of machines 08:44:01 nay sayers have an easy time, initially. It takes time to reach a point of visibility. 1% often takes "most" of the timeline. Then the exponential curve gets the rest done more quickly 08:44:30 PhilA has joined #mlwDub 08:45:01 We've gone from mainframes thru several levels of devices, to reach the final generation of human-oriented devices. 08:45:35 The latest, called mobile phones/devices number in the hundreds of millions 08:45:48 the text generation will outnumber the number of people 08:45:55 the next... 08:46:18 davidU: communication among "things" in the internet, talking to humans in emergency or other situations 08:46:32 .. these devices are going to dominate future computing unverse 08:46:43 in this "next age" the age of automonomour machines, will communicate more with each other than with humans 08:47:11 davidU: automonous devices already exist, they communicate with each other and us 08:47:32 eg. of autonomous devices: iRobot - autonomous vacuum cleaner 08:47:43 .. what are the communication signifiers that enable us to operate a vacuum cleaner or a mobile phone? 08:48:03 .. what are the decisions of autonomous cars 08:48:09 .. these are fundamental challenges 08:48:34 .. all developments are very chaotic - standards settings and policy making are essential for this 08:48:55 in terms of policy making decisions are very chaotic ... industry wants to thrust ahead. 08:49:25 .. consumers jump on board - it is important to balance advantages of new technologies with potential pitfalls 08:49:30 users want the devices, without thinking too much about consequences 08:49:45 google glass as e.g. of augmented reality 08:49:46 .. hard for policy makers to keep up 08:50:19 defacto industry standards will be faster to develop than those standards that standards bodies develop. 08:50:31 .. technology developments - augmented reality interfaces, google glasses etc. 08:50:32 former may influence latter 08:51:58 "Code is Law" Lawrence Lessig. 08:52:07 davidO: need of human societies to interact in a positive way - not in a relationship of winners and loosers 08:52:38 ... from CODE and other laws of cyberspace (http://code-is-law.org) 08:52:59 .. semantic web can accelerate understanding of this 08:53:20 .. many have seen results that google exposes - creating wikipedia like pages from accumulated search results 08:53:42 .. bring human component of wikipedia like large scale cooperation together with semantic web processing 08:54:00 .. we are creating an interoperable hybrid very powerful system 08:54:31 Human-computer interoperability is coming ! Wikipedia-like pages created using semantic web tools from Google-like data 08:54:48 .. are are creating the premises of new types of social interactions, that can be abstracted to a political level 08:55:17 DavidO: recommends the "Proactionary Princile" 08:55:51 Developing a balanced approach to decision making. 08:56:34 topic: presentation from peter schmitz 08:56:48 peter: building a new architecture in the cellar project 08:56:58 .. deal also with metadata standardization, format standardization 08:57:17 .. esp. in legal domain there are many XML based structures 08:58:43 .. re-use policy of the EC 08:59:08 .. purpose is to increase efficiency in EC 08:59:17 .. currently developing an open data license 08:59:31 .. publications office: publisher of EU 08:59:55 .. EC, european parliament, other institutions 09:00:06 .. publishing in 23 languages 09:00:35 .. man public online services - eur-lex, eu bookshop, public procurement, r&d on cordis 09:00:47 .. position of publication office on re-use and SW / linked data: 09:01:14 .. we are part of EC, so we are part of the execution of this initiative 09:01:26 .. we have a re-use policy led by the dginfos / future dgconnect 09:01:59 .. in about autumn first version of european data portal will be online 09:02:13 .. EU level: standardization participation esp. in the legal domain 09:02:37 .. topics / ideas of re-use of language resources in NLP domain 09:03:32 .. our contributions: multilingual thesaurus like euroVoc, multilingual controlled vocabularies and taxonomies ("common authority tables"), linked multilingual XHTML content (official journal, case law) 09:03:58 heididp has joined #mlwDub 09:03:58 .. all these will be provided for re-use in new dissemination architecture 09:04:11 .. legal content started in SGML, now XML, being converted into XHTML 09:04:44 .. content delivery infrastructure for linked open multilingual data 09:05:02 .. will provide storage, dissemination, content, provision of persistent URIs 09:05:13 .. prefix: http://publications.europa.eu 09:05:27 .. support and encourage data providers to provide RDF 09:05:48 .. visualisation tools based on RDF 09:06:07 .. encourage colleagues to provide their data in RDF too 09:06:15 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 09:07:00 peter: for open data portal: possibility to contribute ideas, data cataloging ... 09:07:22 peter: crowd-sourced annotation and adaption of LOD: a question for us 09:07:40 .. we have annotation of official content, but there might be use cases for crowd-sourced annotations 09:07:54 .. but need to define quality support in this 09:08:08 .. provenance tracking, history and storage is important 09:08:27 .. LOD and authenticity - will there be an organization for LOD? 09:08:38 .. how to implement this concept, how to approve it? 09:08:59 .. for thesaurus we have a release mgmt, for controlled vocab we trace histories 09:09:10 .. existing codes will remain in vocabulary + time spans 09:09:35 .. further application domains for MLW - LOD in eGov, health: 09:09:49 r12a has joined #mlwdub 09:09:52 .. we provide stable and persistence URIs for data 09:10:03 .. would like to discuss: what about authorized relationsships 09:10:31 .. e.g. a journal is published in 23 languages - what is the authority relation here? 09:11:03 .. about LOD from the public side - provide a European Legislation Identifier (ELI) 09:11:09 Pedro has joined #mlwDub 09:11:21 .. part of standardization of PSI 09:11:45 .. example: http://eurlex.europa.eu/eli/dir/2008/98 09:12:00 .. will allow to align legislation across Europe 09:12:37 peter: LOD and it's role in MLW-LT metadata 09:12:59 peter: integration of LOD through MLW-LT metadata 09:13:25 .. references from web content items, e.g. entries in multilingual thesauri or authority tables etc. 09:13:48 .. enrichment of web content with HQ information, to improve MT, localization workflows etc. 09:14:13 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 09:14:38 topic: presentation from Europeana (Juliane Stiller + Marlies Olensky) 09:15:18 juliane: working on europeana 2 - multilingual access of europeana content and europeana data layer 09:15:35 .. europeana facts: launched 2008, cultural heritage information system 09:15:44 .. data from archives, audio visual archives, libraries 09:16:09 .. build a digital library as a single access point, today access to about 23 mill objects 09:16:27 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 09:16:54 .. map of Europe - main content is coming from a few European countries 09:17:12 .. e.g. if metadata comes from France, metadata is in french, but object might be from a different language 09:18:06 .. how does multilingual access and search work on Europeana? involves interface, 09:18:14 .. search (query translation and document translation) 09:18:31 .. result presentation (enable users to assess relevance of results) 09:18:42 .. and browsing, important for cultural heritage domain 09:19:01 .. people need to be able to "find the unknown" 09:19:23 .. Europeana: static interface is translated into 26 different languages 09:19:40 .. query translation prototype developed for 10 European languages 09:19:59 .. a document can be translated after being found, via MS translation API 09:20:06 .. now work on semantic data layer 09:20:21 .. multilingual alignment of controlled vocabularies etc. 09:21:10 marlies: edm comprises cross-domain metadata - library, archive, museum 09:21:36 .. edm is a roof for different domains and levels of granularity 09:22:05 .. europeana is a cross-domain framework, using SKOS, CIDOC-CRM etc. 09:22:15 .. can use then specific parts e.g. from museum domain 09:22:39 .. basic distinction: "provided item" vs. its digital representation, plus metadata record 09:22:47 .. allow for multiple records of one object 09:23:08 .. composition of objects, important for library and archives domain 09:23:17 .. can re-present contextual resources 09:23:29 .. a metadata format that can be specialized 09:24:36 .. edm case studies, see http://pro.europeana.eu/web/guest/case-studies-edm 09:25:02 .. LOD pilot with 2.4 mill objects, contributions from spain, norway, austria, sweden, belgium 09:25:40 .. places or items are mapped to places 09:25:46 .. multilinguality and EDM: 09:26:04 .. semantic data is multilingual, see data cloud developed in europeana connect project 09:26:15 .. different vocabularies are aligned with each other 09:26:36 .. not only multilingual vocabularies will allow for multilingual search results 09:26:54 .. we also align monolingual results by a pivot vocabulary 09:27:47 .. language tags play a role in the edm too - labels in different languages 09:28:23 .. now example how Europeana portal deals with multilinguality 09:28:51 juliane: example search for "cheval" - result list, facets, filtered by language 09:29:09 .. metadata is in czech 09:29:47 .. cheval is not in the metadata - result was found because the metadata fields where enriched with different language versions including a thesaurus with the term "cheval" 09:30:09 .. with multilingual enrichment of vocabularies you can enhance multilingual search 09:30:24 .. we are now working on how to present this to the user 09:30:30 .. summary: europeana is very multilingual 09:30:43 .. multilingual "metadata + object + user" 09:30:57 .. hard to retrieve objects but also to present to a multilingual audience 09:31:27 topic: "setting the stage" - QA 09:31:32 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 09:32:37 daveL: question on DavidO - easy multilinguality on the web - will that be a source of a big change? 09:32:40 davidO: yes 09:33:04 .. MT has been very active, statistic approaches created a new generation of MT 09:33:14 .. now statistical MT has reached maximum potential 09:33:25 .. now there is an opportunity to apply further techniques 09:33:46 .. not only for translation of a piece of content, but also for akquisition of content 09:33:49 .. e.g. for training 09:34:11 .. additional techniques to differentate between speakers, understand what not to try to transcribe 09:35:01 paul: talking about translation and multilinguality, in sense of terms / multilingual alignment / multilingual resources 09:35:31 .. what is the role of e.g. LR? what role does NLP play in Europeana? 09:35:44 .. e.g. beyond terms: lexical resources, morphological information 09:36:03 juliane: very important, in Europeana connect we built language resources 09:36:20 .. we were looking for resources to implement cross-lingual search 09:36:38 nicoletta: we should discuss very carefully about the role of LR with respect to MLW, SW 09:36:50 .. there are so many dimensions one should touch 09:37:16 .. at the moment (in MLW-LT) we discuss metadata - "content" in Europeana is another level that touches lexicon etc. 09:37:25 .. there is also big data in our field 09:37:53 .. so there are so many dimensions - the role of LR in big data environment needs to be discussed, including policy issues 09:38:16 paul: policy issues in terms of standardization in the EU context are important too 09:38:36 .. how do we deal with standardization, making linked data (content, lexicon / linguistic) "official" 09:39:23 kimmo: very important that we don't kill emerging new activity by standardizing 09:39:46 .. previously things have been killed by standardization 09:40:12 .. our practical approach in MLW-LT and other projects has been: we impose standardization by example, not by conditions and rules 09:40:26 Jirka_ has joined #mlwDub 09:40:41 .. has the disadvantage that it is slightly chaotic, but would still speak in favour of involving people in doing something 09:40:59 .. e.g. EU publications office should be the lead in standardizing their work 09:41:35 .. that might become a part of a standard or not, and need to link this to standards work 09:42:23 philArcher: talked to EDM people that the EDM could become a part of W3C standardization, also have a lot of questions to peter, will take that offline 09:42:59 thierry: also important to see if we need other standards like LMF for lexicons etc. 09:43:48 Question: what is the one take-away from this session? 09:43:52 arle: question for davidO - how to build the bridge from current efforts we saw so far to your vision? 09:44:44 davidO: competition will drive adoption of models 09:45:06 .. this is just a start of a conversation - Europeana is a wonderful initiative 09:45:23 .. would love to see it to evolve to opportunities that are commercial as well 09:45:44 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 09:45:52 gderiard has joined #mlwDub 09:46:03 alexanderLik: why not use regulation as a vehicle for industries to implement things 09:46:43 Ioannis: who will pay for it? 09:46:49 .. things need to be marked driven 09:47:01 .. if there is no carrot nothing will happen 09:48:02 olaf: to all the speakers - what is the role of the crowdsource for metadata gathering, metadata definition etc. 09:48:05 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 09:49:23 juliane: doing that in Europeana to some extend - looking into user logs etc. 09:49:48 now break 09:49:49 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 10:00:42 before break about "qero" project (link to be provided later) 10:17:19 Dave Lewis: This section is about linking data. 10:17:24 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 10:17:28 Yves_ has joined #mlwDub 10:17:47 topic: presentation from Sebastian Hellmann 10:17:57 Sebastian Hellman: researcher at Uni Leipzig. Will be talking about linked data for NLP and Web annotation. This is a broad topic. I will point to projects as an overview. 10:18:19 scribe: Arle 10:18:47 ..Motivational slide: lots of walled gardens. This is the way it was before RDF. There are many beautiful gardens, but you can't go between them. I want to talk about turning walled gardens into networks of parks. 10:19:24 ..How do we leverage linked data for NLP? They cover many domains. The data is crowdsourced. This is the background. 10:19:33 RDF is about semantic interoperability. 10:19:49 Third factor is making the output of NLP available on the web. 10:20:19 Slide shows huge number of Linked Open Data repositories. Currently linguistic data is one part under cross-domain. 10:20:56 omstefanov has joined #mlwDub 10:21:39 Rob has joined #mlwDub 10:22:15 ..Linguistic LInked Open Data Cloud: Linkes many areas. How do you fund this? Difficult to fund any one institution. There is a time horizon on funding: may lead to death of projects. 10:22:30 ..Funding for cloud remains difficult 10:22:36 s/LInked/Linked? 10:22:42 s/LInked/Linked/ 10:23:14 DBPedia includes eight interlinked language versions. Individual language data is available. 10:23:27 r12a has joined #mlwdub 10:23:29 s/DBPedia/..DBPedia/ 10:24:25 ..Wiktionary2RDF: Communities create wrappers (made by domain experts).Converted to Lemon via Mediator. Anyone can join the community:http://dbpedia.org/Wiktionary 10:25:07 Yves_ has joined #mlwDub 10:25:54 ,,Web Technologies for integrating NLP tools and approaches. Once you are immersed in a technology, you don't see other solutions and start trying to apply it where it doesn't apply. There are cases where RDF makes sense, but others where relational databases make sense. Learn *when* to use it. 10:26:28 ..My solution: RDF allows linking between walled gardens. It has certain properties other data models do not provide. 10:26:35 s/,,Web/..Web/ 10:26:38 Says RDL is the way to link up different gardens, but not why 10:27:34 ..Advantaged: URIs available, formal documentation (like UML), easy-to-understand structure, many tools (e.g., LOD2 Stack), indexing and querying allow big picture. 10:28:17 ..NLP Interchange Format (NIF) aims at interoperability between NLP, language resources, and annotations. 10:28:40 ..First released September 2011. Open project. Growing with feedback. 10:29:01 ..NIP allows interlinking between various tools (slide show structure and tools). 10:29:58 ..No current standard mechanism to connect WWW, Giant Global Graph (GGG), and NLP. There is no way to combine the three. 10:31:02 ..Want to allow annotation by various tools. Also human annotation (links, free text, correction of NLP annotations) 10:31:39 ..But all this does not work together. It has the walled garden problem still. Semantic Web is supposed to fix this, but a lot of work remains. 10:32:10 ..Showed example of how to make it work. 10:32:23 ..Feel free to join. 10:32:56 topic: presentation from Dominic Jones 10:34:00 Dominic Jones: Want to start with an info graphic to show the world by nationality. Want to add to it traditional print media and also user-generated content (on electronic devices). These types of content are very different. What we produce is somewhere in the middle. 10:34:22 ..Issue is the challenge of how to localize this stuff. 10:35:14 ..Compare Flickr, Reddit, etc. Raises issues: provenance, access control (linked *open* data vs. linked data—this may be a blocking issue) 10:36:14 ..Architecture based on CMS Lion, uses XLIFF messaging between various components. What we add is an RDF model of translation, provenance, CNGL service and content models. 10:36:51 ..These models represent data we deal with. 10:37:42 ..Book of Kells is here at Trinity, written on calf skin, in a big glass case. You can't access it yourself, relies on gatekeepers. Compare it to the iPad, where the consumer becomes the producer. 10:38:37 ..CMS-LION emphasizes user-generated content. Compare to "telephone" game: our system lets us know who changes what in the translation, what happens in QA, what is consensus, etc. 10:40:04 ..Will show an example of a tweak. We break it into the content model, and into a job. Provenance is critical in working with things. We use a lightweight version of the Open Provenance Model. 10:41:40 ..We have Artifacts, Processes, and Agents. Able to map process diagram to these things. This process allows us to enrich the translation model with information on who did things, and how. 10:42:24 ..We are integrating CMS LION with Panacea. Focus on post-edits to retrain MT systems. Also tying with LSP (VistaTEC). 10:42:47 nico has joined #mlwDub 10:43:10 ..Will use CMS-LION as a test-bed for ITS 2.0 and tie it in with Solas (Limerick test bed for workflow orchestration). 10:44:13 ..Now we are at the intersection of Multilingual Semantic Web, Language Resources, and Localization. This is MLW LOD. 10:44:54 topic: presentation by Jose Emilio Labra Gayo 10:44:57 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 10:45:48 ..My work is in multilingual LOD. We translated product schemes and procurement vocabulary in EC projects. 10:46:46 Jose Labra: We have HTML, human readable, but how do we move from that to machine-readable that is intrinsically multilingual. 10:47:02 s/..My work/Jose Labra:My work/ 10:48:29 ..There is data and there is *multilingual* data. We need to account for human readable information (e.g. "professor" vs. "catedrático". Moving from this to machine-readable is a challenge. 10:49:12 ..Want to talk about best practices impacted by multilinguality. Have 8 best practices. 10:49:41 ..1. Design a good URI scheme. Cool URIs don't change, identify things, are human-readable. 10:49:43 Note to self - Gov Linked data WG Best Practices on Linked Data needs to include section on multilingualism - ref. Juan's talk 10:50:24 ..e.g. dbpedia.org/resource/Spain is good. 10:51:09 ..I'm not sure if internationalized URIs are good or not. Can create problems with phishing, limited support, and human readability across languages. 10:52:38 ..2. Model resources, not labels. URIs should map to contents, not to particular labels. We don't want to map to different language labels. Use universal pointer with RDF labels to language specific versions. But can cause problems with thesauri. SKOS uses URI-identifiable labels. 10:53:16 ..Question: What happens if we want to use localized URIs. Perhaps using language identifiers in the URI is good, but I don't know. 10:53:49 3. Use human-readable information. Machine information can also be human readable. 10:54:17 ..Question is how to balance between human readable and RDF world. 10:55:13 ..4. Use labels for all the entities you model, not just concepts, not just main entities. Displaying labels is easier if you don't have to make multiple requests. 10:56:12 ..Problems: Selecting the proper label. Only 38% of non-information resources have labels. Also, avoid camel case or similar notations in labels. "UniversityOfOvieda" is a bad label. 10:58:07 ..5. Use multilingual literals. IETF lets you select the right tag. But multilingual literals can create problems. The right technology can deliver less than ideal results. E.g., SPARKL works with labels, what happens when you use a language-bound query (e.g., for "Professor" without a language tag). Need to create a default label with no language tag. 10:58:44 ..This is currently unused (only 4.78% of info-resources use a language tag, and only 0.7% use more than one.) 10:59:44 ..We need to balance between RDF and XML and be aware of consequences of mixing. This is a challenge. 11:00:36 ..6. Use content negotiation. Use Accept-Language. Without it we end up returning too much data. Allows you to get labels in the language you want. 11:01:07 js has joined #mlwDub 11:02:33 ..7. include labels without a language tag. This makes it easier for SPARQL queries. Need to know what the default language is. Is there a way to declare the primary language of an RDF data set. 11:03:35 ..8. Use multilingual vocabularies. Claimed that they should include descriptions in more than one language, but most do not. Also what to do when not localized? 11:04:10 ..Raises issues of when categories don't map precisely across languages. 11:04:50 ..Some other issues: Unicode support. Microdata doesn't allow language declarations. Internationalization not covered in RDF. 11:04:59 ..LOD offers new challenges. 11:05:13 gderiard has joined #mlwDub 11:05:40 topic:Question and Answer 11:06:50 Question: José, you recommended using literals without language tags at all. But what happens when the literal can mean different things in different languages? E.g., Gift in English is very different than Gift in German. 11:07:26 José Labra: These are difficult issues. There are practices to model lexicons and separate concepts from labels. 11:08:36 Sebastian Hellman: The URIs are not human readable if you do not use IRIs. But then you have to use % encoding which is impossible to read. In DBPedia we use IRIs. We think libraries should support IRIs. 11:09:06 Maxime Lefrançois: I have a question for Dominic. Do you use the W3C Provenance concepts? 11:09:34 Dominic Jones: The Open Provenance Model predates the W3C work and gave rise to it, but we chose it as an off-the-shelf solution. 11:10:24 Thierry Declerck: Question about terminology (did not catch it) 11:11:23 Tadej Štajner: For Sebastian. I've been following the NLP to RDF work. Is there any work on encoding this in an inline format directly in a document. Some of our use cases require this rather than storing them separately. 11:12:08 Sebastian: It might be possible, but it is difficult in general. It is hard for any annotation format. Maybe easier with RDFa. We'll have to discuss more. 11:13:14 Pedro Diez: Maybe we need to make a distinction between the kinds of data we are trying to link. We need a map without ambiguity to link linguistic data and general lexicons. Right now it is different to link to concepts with different literals across languages. 11:14:36 ..Regarding this, maybe we need to make distinctions between different kinds of data: brands, names, telephone numbers, words. Most work we can reuse are lexical databases. They represent hard work. 11:15:15 Maxime Lefrançois: In the Linked Open Data Cloud, is there work for linguistic. What are the links between ??? 11:15:47 Sebastian: It is a draft right now, but for the Linguistic Linked Open Data Cloud, it is more of a vision. 11:16:05 ..We hope the original LOD Cloud will get more interactive over time. 11:28:08 SDL_BDM_IRL has joined #mlwDub 11:29:34 Test has joined #mlwDub 11:44:22 teddy has joined #mlwDub 11:44:57 teddy has left #mlwDub 12:05:13 n has joined #mlwDub 12:13:47 Ralph has joined #mlwDub 12:15:35 scribe: Jirka 12:16:56 RRSAgent has joined #mlwDub 12:16:56 logging to http://www.w3.org/2012/06/11-mlwDub-irc 12:17:04 topic: Linked Open Data and the Lexicon session chaired by Arle Romel 12:17:17 scribe: Jirka_ 12:17:45 rrsagent, draft minutes 12:17:45 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html Jirka_ 12:18:13 leroy has joined #mlwDub 12:18:39 Ralph has left #mlwDub 12:19:11 topic: Bringing Terminology to Linked Data through TBX by Alan Melby 12:19:21 Yves_ has joined #mlwDub 12:20:03 Rob has joined #mlwDub 12:20:05 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 12:21:01 Alan: one of objectives of workshop is to bridge gap between traditional terminology and LOD/LD efforts 12:21:58 ... TBX/RDF 12:22:36 ... TBX is TernBase eXchange standard 12:22:38 omstefanov has joined #mlwDub 12:23:18 ... not single language, family of languages called dialects 12:24:02 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 12:24:09 philr has joined #mlwdub 12:25:48 ... TBX/RDF is isomorphic mapping of TBX to RDF 12:26:03 ... alows loseless bidirectional conversion 12:26:22 ... why to have TBX/RDF? 12:26:39 ... there is a lot of knowledge in TBX, it should be part of LD 12:27:13 ... LD can benefit from term disambiguation 12:27:35 ... access to well-established knowledge engineering community 12:27:59 ... provides concept-based information for translation 12:28:19 Terminology industry uses TBX for interchange, but 12:28:47 it is not well suited to an online, open style of data exchange like RDF is 12:29:10 thus this project is for terminologists to benefit from LD as well as the other way around. 12:29:24 ... URIs for datacategories are in www.isocat.org 12:29:54 ... TBX/RDF uses XML+RDFa1.1 12:32:49 topic: Managing Director, Interverbum Technology by Ioannis Iakovidis 12:33:35 Arle has joined #mlwDub 12:33:48 Ioannis: Our company develops TermWeb - terminology management system 12:35:24 Trant has joined #mlwDub 12:37:50 Ioannis describes what tool can do and how complex terminology workflow could be 12:39:00 ... terminology is everywhere 12:39:18 paulb has joined #mlwdub 12:40:01 r12a has joined #mlwdub 12:40:14 r12a has left #mlwdub 12:40:16 ... challanges - integration with another tools (no standard API) 12:40:24 r12a has joined #mlwdub 12:40:27 ... term identification and tagging 12:41:38 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 12:47:56 ... some standard/convention is highly desirable 12:48:32 topic: Extending the Use of Web-Based Terminology Services by Tatiana Gornostay 12:49:37 mlefranc has joined #mlwdub 12:50:36 Tatiana is from TILDE company, providing localization, translation, ... mainly for LV, LT, ET and RU 12:50:59 ... they have to deal with terminology every day 12:51:46 ... efficient communication requires terminology 12:54:06 ... terminology is bridging language and semantic technologies 12:54:40 ... eurotermbank.eu - 2nd largest term. database in Europe 12:57:38 ... describes Accurat & TTC tools/projectd 12:57:45 s/projectd/projects/ 12:58:07 ... TaaS - Terminology as a service 13:00:09 s/challanges/challenges/ 13:01:46 ... terminology can enhance automation in LOD 13:02:25 ... terminology helps in automating work with multi/cross-lingual metadata 13:02:45 nico has joined #mlwDub 13:03:09 topic: The Need for Lexicalization of Linked Data by John McCrae 13:08:13 ... PYTHIA - ontology based question answering system 13:08:17 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 13:09:36 ... proper rdfs:label is only on about 2% of content 13:11:05 ... labels are very amiguious 13:12:16 labra has joined #mlwdub 13:13:25 ... created lexicon model relative to ontolgies 13:13:40 ... built on ISO 24613 and SKOS 13:14:12 dF has joined #mlwDub 13:14:43 Sebastian has joined #mlwDub 13:14:59 Sebastian has left #mlwDub 13:14:59 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 13:15:36 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 13:15:36 ... further development under W3C Ontolex CG 13:18:29 topic: Cool URIs Are Human Readable by Phil Archer 13:19:32 ... ISA - Interoperability Solutions for European Public Administrations 13:20:46 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 13:22:58 Phil describes what interoperability means in terms of term 13:23:23 ... unique identifier of term is very important for interop 13:25:21 ... domain names used in URI should be neutral 13:29:57 ... providing equivalent localized URIs is friendly for users but adds more work for publishers and needs more power for processing 13:30:42 topic: Q&A 13:30:58 rrsagent, draft minutes 13:30:58 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html Jirka_ 13:33:18 Pedro: Dejavu - we had similar problem 18yrs ago. 13:35:19 Tatiana: IMHO natural language can't be formalized. ... 13:35:22 Kimmo 13:36:15 Kimmo: EU commision tries to do this for years. Fomalization of concepts has limits. 13:37:16 Felix: 18yrs ago there were no Web. 13:38:44 Arle: Hopefully we are at inflection point now and we will see rapid progress. 13:40:42 Kerstin: Some terms are translated only in some languages. 13:41:25 Alexander: What to translate and what no to can be hold in metadata. 13:43:44 mhellwig has joined #mlwdub 13:43:44 Tadej: RDF is unable to express that something is translation. 13:46:37 Marion_Shaw has joined #mlwDub 13:46:39 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 13:48:32 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 13:48:40 PMacAree has joined #mlwDub 13:49:50 scribe: tadej 13:50:06 topic: Identifying Users and Use Cases - Matching Data to Users 13:50:23 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 13:51:52 DaveL: After an interesting discussion, it's time to draw out concrete use csaes 13:51:58 s/csaes/cases 13:52:00 s/csaes/cases/ 13:53:40 Thierry: Can you give us short descriptions of the use cases for the technologies and standards describe today. What are the business cases, how can language LOD improve business? 13:54:56 PeterS: There are several in the legal domain and public services. 13:55:10 ... From the transparency side, equivalence of languages is important. 13:55:56 ... For the open data, we have to come back to this 13:56:15 Thierry: Is there a concrete requirement for one of those use cases? 13:57:14 PeterS: There were interactions, a lot of feedback comes from institutions that use the technology, as well as from the EU member states, but more on the formal side. There is less informal feedback from the community. 13:57:17 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 13:58:34 Julianne: For europeana, we are gathering feedback from not just librarians in the community, but also wider public via the Europeana Connect initiative. 13:59:13 ... In terms of multilingual uses, we did access log analyses and we indeed have multilingual users, but majority are monolingual which need to be able to use it. 13:59:38 Thierry: How was the access implemented? SPARQL? 14:00:10 Julianne: For now, the interface is via downloading a dump, the rest is under implementation. A multilingual interface is difficult to implement. 14:00:51 ... With regard to users, there are several groups in terms of requirements: some come from education, professionals and general public. 14:01:42 fsasaki: Are users from the localization/internationalization area a new use case for you? 14:02:51 Julianne: We are aiming to have cross-lingual descriptions for our resources also for othe use cases. 14:03:57 PeterS: We are at the end of the chain, the data gets generated upstream. 14:04:36 dgroves has joined #mlwdub 14:06:27 dgroves has left #mlwdub 14:06:47 dgroves has joined #mlwDub 14:07:23 HorstKraemer: We work with the german government on e-government projects. We started by using Drupal, and generally the open source CMSes are a big trend in this space. 14:07:45 ... the LOD, inferences, semantic technologies still need a lot of acceptance management on the e-government side. 14:08:56 philr: We are very interested in using RDF for integrating various info sources. We want to provide a full picture of the data, currently we do this via synchronizing the disparate sources, which is complicated. 14:08:57 s/HorstKraemer/SebastianSklarß/ 14:09:11 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 14:09:26 ... We see opportunity in using RDF for this kind of data integration purposes. 14:10:21 Pedro: We work in localisation. We have 2 scenarios where we need this metadata. 14:11:04 ... 1) real-time processing of metadata for normal translation workflows 14:11:19 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 14:11:46 ... 2( real-time translation, which happens synchronously on page visit. Here, the client needs to be able to influence the process of serving RT translated content. 14:13:58 TatianaG: At Tilde, terminology management is an invaluable source for automatic cross-lingual annotation. 14:15:01 AlanMelby: What is the status of expressing terminology in ITS1.0? 14:15:26 Yves_: There is a termInfo* family of attributes which express that. 14:16:32 AlanMelby: What about looking at meanings and not just strings of characters - ambiguous queries in information retrieval can be solved via terminology disambiguation. 14:17:24 ... A semi-automatic term concept suggestion could solve this, a human can quickly scan and link documents to existing terminology databases. 14:17:49 ... One of the sticking points are the ontologies: how to name certain subject fields? 14:18:19 ... You can get a long ways with the current state of the terminology resources. 14:19:21 fsasaki: Are there any things that customers look for? Any workflows that need to be supported? 14:21:22 AlexanderLik: We don't operate our solutions on the web, but internally. If the new standard supports DITA-based XML, it's easy for us to adapt. A new format would be prohibitive for us. 14:22:33 Pedro: We have clients with public and private areas of information. If you can't keep personal information in the sources, you also can't keep these things in translation memories. 14:22:48 ... These things could be better controlled by using a tag set for that . 14:23:25 Thierry: Are there any other aspects about security in Linked Data? 14:24:40 Pedro: If the clients need special security measures, the server needs to be located at the client premises, so that is also an operational issue for us. 14:25:16 paulb: There are several people at DERI working on access control on linked data. There is demand for thins functionality from commercial entities. 14:25:48 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 14:25:50 Arle: Pedro's concern is legitimate: not all Linked Data is Linked Open Data. This is an important aspect for many users. 14:26:49 PhilA: LD != LOD. W3C is aware of that, last week a paper at the Data Forum, a paper presented an approach that had an ACL flag for every triplet . 14:27:00 ... a lot of solutions are coming. 14:27:27 fsasaki: The LOD working groups are also gathering requirements on that aspect. 14:27:47 s/LOD/LOD platform WG/ 14:27:48 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 14:28:00 see http://www.w3.org/2012/ldp/charter for more info 14:29:02 ??: When converting XML data structured into more redundant RDF data structures, clients need argumentation to take that approach.. 14:29:42 Thierry: What is the benefit of LOD as opposed to plain XML? Demonstrating business value is important. 14:30:36 Micha has joined #mlwDub 14:30:37 DesOates: In Adobe, we already have a "Linked Closed Data" corpus of terms. It's expensive to maintain and manage. Our problem is not the openness, but actual applicabilty. 14:31:03 ... Developers will never look up termbases for UX or documentation. The Linked part is irrelevant if we can't get it into the software source. 14:31:46 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html tadej 14:48:34 jimbo has joined #mlwDub 15:06:42 Micha_ has joined #mlwdub 15:07:50 dF has joined #mlwDub 15:07:53 nico has joined #mlwDub 15:08:29 Kimmo: bridge building is often talked about but not very easy to do 15:08:37 scribe: dgroves 15:08:54 topic: presentation from georg rehm - META-NET and the strategic research agenda + LOD 15:08:56 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 15:09:28 omstefanov has joined #mlwdub 15:10:00 Kimmo: we are often overwhelmed by the multitude of information. META-NET (in particular META-SHARE) is a European initiative to gather related language tools, resources under one access point 15:10:43 META-NET project site: http://www.meta-net.eu/ 15:11:08 META-NET Strategic Research Agenda and LInked Open Data presented by Georg Rehm 15:11:44 mlwDub has joined #mlwDub 15:11:46 Georg: conference June 20/21st in Brussels META-Forum 15:13:09 Georg: what METANET are trying to do is trying to provide each langauge community with resources in order to overcome language borders, the last remaining border 15:13:51 Georg: focused on under-resourced languages of Europe 15:15:13 s/langauge/language/ 15:15:30 Rob has joined #mlwDub 15:15:32 Georg: 3 lines of action: META-VISION, META-SHARE, META-RESEARCH 15:16:52 Georg: ELRA/ELDA are part of the consortium and have already committed to use the META-SHARE approach 15:17:04 Pedro has joined #mlwDub 15:17:10 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 15:18:19 Georg: META-NET grew from T4ME, CESAR, METANET4U, META-NORD projects 15:19:39 Georg: Strategic Research Agenda - document to mobilise researchers, users and providers of LT for collaboration & community building 15:20:43 Georg: require appropriate programme, appropriate actors and the appropriate support for an effective SRA 15:21:33 Georg: 3 vision groups: Translation and Localistion; Media and Information Services; Interactive Systems 15:21:46 paulb has joined #mlwdub 15:22:55 Georg: produced 30 whitepapers discussing language resources and identifying gaps in terms of resources, funding etc. 15:24:27 Georg: Research Priority Themes are "Translation Cloud", "Social Intelligence" and "Socially Aware Interactive Assistants" 15:25:40 Georg: Translation Cloud is not restricted to Machine Translation, includes human translation/translators, LSPs... 15:26:39 Georg: Social Intelligence is concerned with multilingual text mining, discussion platforms at a European level... 15:27:18 Georg: Socially Aware Interactive Assistants = "super-Siri", focus on Speech Technology 15:27:34 fsasaki_ has joined #mlwDub 15:27:43 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki_ 15:29:41 Georg: Data Challenge will be one of the challenges for Horizon 2020. It is concerned the with data-value chain, open data, big data, linked data 15:31:53 dF has joined #mlwDub 15:32:05 Georg: Under the translation cloud, access to data (such as linked data) can be used to help improve language technologies including machine translation 15:33:41 Georg: Social intelligence can help provide novel methods to access data, to clean data sets... 15:34:07 Where 15:35:08 Georg: We need to consider data security issues and how this effects data distribution/sharing 15:35:33 ... can I find out more about the Socially Aware internactive applications? Specifically speech recognition? Can't find anything useful on either http://www.meta-net.eu/forum/the-strategic-research-agenda-for-multilingual-europe/ or http://www.lt-world.org/ 15:36:08 Georg: we need to publish various langauge resources (terminology, TMs, wordnets etc) as linked (open) data 15:37:37 Georg: META is an open strategic alliance with 600+ members with the goal of supporting the Strategic Research Agenda: http://www.meta-net.eu/join 15:37:39 omstefanov, more info is at http://www.meta-net.eu/sra - soon you'll have the SRA linked with more info about that topic and others from ehre 15:37:50 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki_ 15:39:36 Kimmo: Horizon 2020 is still only a proposal, nothing is set in stone. The topics may change and the Data Challenge is not guaranteed to be a challenge and additionally there is no guarantee that language will be included if it is a challenge. 15:40:07 DLewis: What is the timeline for these decisions to be made? 15:40:33 Kimmo: By the end of next year. It's a political process 15:40:33 labra has joined #mlwdub 15:41:21 NickCampbell: Is the Latvian person really going to telephone someone in Portugal? 15:41:37 Georg: It is a vision - looking beyone what is currently possible 15:42:48 RSchaler: Siri isn't great, but it works. What is the real revolutionary output of what you're presenting? 15:43:55 Georg: The capabilities are very limited on the iPhone. However, you can't really engage in a dialogue with the system. We want to extend it to something that is following you across devices and has a profile of your preferences etc. 15:44:03 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki_ 15:45:24 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki_ 15:45:39 RyanHeart has joined #mlwDub 15:45:43 Q: We have a limited number of languages. The EC/META-NET could invest in making the speech tools support some of the lesser-supported languages or help finance projects that would make speech tech. broaden in scope for more languages and more accents 15:46:44 Georg: Many of these tech. research gaps are identified in the language white papers. 15:46:48 So far, the language industry functions based on markets? 15:48:14 PhilArcher: in w3c office in New Dehli it came across that people prefered to "speak" to the web - based on the proposed work you can really transform countries 15:49:14 Would companies have invented a keyboard if users could have talked to devices? 15:49:16 DLewis: We have to always be aware of the question re. what is truly innovative/novel about the work. 15:50:38 Is 'vision' about extending current technologies to other languages, platforms, environments? 15:50:55 Q: Clearly adding current capabilities to new languages is a huge issue. Regarding the current capabilities of the technology, are there any theoretical breakthroughs that are needed in order to accomplish the vision for 2020 or is just a matter of applying existing technologies better? 15:52:08 Georg: Of course there are many breakthroughs needed. One of the main components of the SRA is a roadmap document that builds timelines of what needs to be done when, with priorities identifying what problems need to be solved along the way 15:52:36 Georg: we have to be realistic and credible at the same time 15:53:12 DesOates: the key is to join the dots between the technologies. The round-tripping between the technologies still needs to be done 15:54:40 We need to ensure to include the link between language resources and language technologies in the SRA 15:55:13 Kimmo: You can have a critical discussion about this at METAFORUM 15:55:48 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 15:56:58 Kimmo: We should not let the fact that Google/Apple have already developed the technologies to discourage further research/development - trying them out it is clear that the problems (e.g. interactive assistant, subtitling) are not completely solved. 15:57:53 The point is not to say: Apple or Google have solved the problem (they might or might not have), but whether we can manage to come up with revolutionary new ideas for technology solutions, rather than working on extending existing ones. 15:58:06 Kimmo: session concluded 15:59:14 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 15:59:15 scribe: mhellwig 15:59:35 topic: action plan discussion led by dave lewis 15:59:37 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 16:00:17 DaveL: What changes of thinking and planning to we need to advance multilingualWeb 16:01:13 DaveL: we have two communities (language resources and language technologies) and there are synergies between them - they share confereces and projects 16:02:00 ... a lot of the activity relies on research which causes people to worry about sustainability 16:03:05 ... on the other side we have the private sector, which may be side driving activities 16:04:16 ... besides multilingual or multinational organisations, localisation industry which has a long tail of companies 16:04:42 ... MultilingualWeb-LT is interesting, because it brings the communities together 16:07:17 ... the question is: how do we address the requirements of the communities? 16:07:55 ... and how do gather information from all the communities 16:09:48 Q: Where is the data to be linked? Maybe we need an inventory of the most important data that need to be linked. Who is caring enough to pay for it? 16:11:23 ??: If you talk about sustainability, what is the motivation for people to host the LOD. All participants in the LOD cloud have different incentives and that's why it works. 16:12:22 ... the publisher provides the data and then everybody can take it, use it. As long as the incentive is there LOD will continue 16:13:37 dF: so far no real use case has emerged. LSP serve the publisher, but the publisher doesn't need the data link. Another problem: "here is data, now link it" is the wrong approach. It is not sustainable to link things that were created 16:14:49 ??: the motivations for LOD are very diverse. one motivation to open databases could be that there are many silos in information technologies that do not integrate. So a motivation is to integrate and create interoperability. 16:15:24 ... people want to open data, data that is multilingual 16:16:28 fsasaki: luxembourg workshop: what are the important parts of the infrastructure work to make the vision presented work? Best practices would help. 16:16:48 ... a project that creates best practices and uses them will be helpful 16:18:15 SebastianSklarss: food agricultural organisations - started in 1980 - converted their data to make it linkable. They provide the infrastructure for linking up; they feel the have the mandate to host the data. From there you can start linking up 16:19:24 ??: datawarehousing exists separate from the language industry; the data is structured to extract meaning from the information. 16:19:59 ... (referring to LOD cloud) it looks like a swarm. on breakthrough to find is 16:21:40 ??: Open Linked Data WG was discussed taht it's not really applicable to linguistics. Persons have thhe same level of granularity, but this is not the case here. Secondly, privacy issues are problematic, linguistic data is natural speech 16:21:47 Micha_ has left #mlwdub 16:24:38 Pedro: internet has three problems: how to retrieve and maintain amounts of data. Second, how do you make it accessible. Third, data maintenance and retrieval of data should be cheaper. 16:25:00 ... We have Web 2.0, now 3.0. We cannot do 4.0 without LD 16:26:04 ??: Emergency cases can use LD across languages, for example a building is on fire and the cleaning lady doesn't speak English 16:26:38 Arle: What do we need from Localisation service providers. We need the involvement of various communities, we want to bring them together. 16:26:59 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 16:28:06 ??: it's not the localisation service providers / tool creators fault if prices go. They have to be aware that the industries are not willing to pay more for fragmentation changes 16:29:40 DaveL: What is the motivation is a good question. We can publish stuff, but making the links it the important point. Maintaning links could be expensive. We have to make LD useful. Data warehouses are a good example, because they found a way of making it profitable. 16:30:43 dF: data hygiene is expensive. Back to the LSP: they serve publishers but there is a disconnect between the two. 16:32:31 fsasaki: data needs to be transformed. bring it an xml format, then to an HMTL format. You run into issues. The key issue of LOD in the diagram (DaveL presentation) is it only an interoperability layer or is it a layer to organise the workflows 16:34:17 dF: use cases are very important. The incentives are there, they are signs of sustainability. Often there is no consensus in the industries or no implementation commitment. Emphasis should be on real use cases. 16:35:24 ??: two motivations for LOD movement: government accountability and responsibility. And for that you need links. 16:36:00 ... Public sector uses LOD. There is the UK crime map, on a heat map can see crime. The police are the biggest users; they save money by accessing their own data. 16:36:33 ... publishing data is expensive, but the cheaper than the alternative 16:37:56 Thierry: you need to publish the document in the public sector, but they aren't compatible nor comparable. Linking data is best. 16:38:34 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 16:38:50 ... It's cheaper to have only one terminology database where others can link to. In the medical domain they are pushing that. We should talk to those communities to see how it's working for them 16:39:43 SebastianSklarss: micropayment is missing for LOD. If it is made for machines, we need a mechanisms to make machines pay for it. 16:40:40 dF: micropayment is feasible. 16:41:25 ... LOD should be paid for by the government. It's a kind of utility and payed for by taxes. We need to influence the policy makers. If they don't see it, then we must go for market sustainability. 16:42:34 DaveL: two parts - gain public sector support with very clear use cases for the public sector. On the other hand, adding value (e.g. data warehousing) for the private sector 16:43:26 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 16:43:51 Pedro: amount of LOD is so huge, that you cannot control it by one body or organisation 16:44:29 DaveL: What we need to look for is use cases. 16:44:59 ... Are there people in the room interested at looking at these issues? 16:45:59 fsasaki: would like to add another use case. Processes that data goes through. 16:46:42 ... define provenance and the state of the translation. It's already done, but expensive. Maybe we can make it cheaper 16:47:45 fsasaki: see also the linked open data working group - co-chaired by IBM with one motivation: making software development processes easier to organize, more sustainable etc. 16:49:17 Nicoletta: small use case: can we link what we have in meta share with the lingual data. Else we have the risk of running different directions. 16:50:16 ... One of the problems linking the cloud, there is a double meaning of "data". There's data and language resources we use to work on the other data. 16:51:03 DaveL: [ends session; thanks participants] 16:51:45 DaveL: Let's talk tonight and in the next couple of days in more detail. We should use the opportunity to move things forward. The use case focus is really important. 16:52:48 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 16:53:05 Kimmo: closing remarks 16:53:51 ... important comments on government involvement. We do not need to standardise more. I say no first to test you. 16:54:18 ... we need to know: What do you want us to standardise? What has consent? 16:54:58 ... LOD is a buzz word that is used in a rather sloppy way. Because we have obsessed over LOD, but it's true that LD is equally linked. 16:55:37 ... Business models - e.g. micropayment - is an important point. We need payment mechanisms, small amounts per transaction 16:55:49 ... a sort of application store in a data environment 16:57:23 ... "What data should be linked and why" is a good question and we should test ourselves. If we don't know maybe then it's not a good idea 16:57:56 ... The importance of the event is discussions. And the documentation will be available (scribes, presentations, ...). 16:58:11 ... We have linked the constituents, but may take several years before things get moving 16:58:50 ... in the future we'd like to co-locate this event 16:59:33 DaveL: [closes today's event; thanks participants] 16:59:54 DaveL: Videos, presentations, transcripts will be available 17:00:03 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 17:00:08 tingley has joined #mlwDub 17:01:11 cfp for localization world event is here http://www.localizationworld.com/lwseattle2012/feisgiltt/ 17:01:26 DaveL: [thanks to organisers of event] 17:02:38 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki 17:02:43 meeting adjourned 17:02:45 I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki