07:12:59 <RRSAgent> RRSAgent has joined #mlwDub
07:12:59 <RRSAgent> logging to http://www.w3.org/2012/06/11-mlwDub-irc
07:14:18 <NSR> NSR has joined #mlwDub
07:25:05 <fsasaki> meeting: MultilingualWeb workshop
07:25:09 <fsasaki> chair: DaveLewis
07:25:14 <fsasaki> scribe: various
07:25:19 <fsasaki> agenda: http://www.multilingualweb.eu/documents/dublin-workshop/dublin-program
07:25:43 <fsasaki> present: manyPeople, many_people, many-people
07:25:58 <fsasaki> topic: welcome
07:26:12 <fsasaki> session to start 9 a.m.
07:26:36 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
07:34:46 <Arle> Arle has joined #mlwDub
07:35:51 <Yves_> Yves_ has joined #mlwDub
07:36:30 <leroy> leroy has joined #mlwDub
07:53:28 <philr> philr has joined #mlwdub
07:54:44 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
07:57:50 <DGroves> DGroves has joined #mlwDub
08:02:23 <davidorban> davidorban has joined #mlwDub
08:02:31 <fsasaki> welcome by dave lewis
08:02:54 <BryanS> BryanS has joined #mlwDub
08:02:56 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
08:03:11 <omstefanov> omstefanov has joined #mlwDub
08:03:43 <fsasaki> introduction by vincent wade
08:03:44 <omstefanov> David Lewis opens conference !
08:04:03 <fsasaki> vincent: welcome to dublin and tcd, delighted to host this workshop
08:04:13 <fsasaki> .. mlw and linking of data across mlw is key to expansion of web 
08:04:23 <labra> labra has joined #mlwdub
08:04:37 <fsasaki> .. in CNGL, we are looking into a value chain from creation to delivery
08:04:44 <fsasaki> .. how mlw content can be integrated
08:05:01 <fsasaki> .. technology on language and multimedia content, personalization
08:05:10 <fsasaki> .. etc. need to be brought together
08:05:30 <fsasaki> .. happy to see so many cngl partners here, collaborating nationaly
08:05:50 <fsasaki> .. science foundation ireland has invested a lot into CNGL, DERI focusing into SW
08:06:09 <fsasaki> .. in FP7 and collaborations across the world, we more and more have these roadmap meetings
08:06:28 <fsasaki> .. we are looking into similar problems, so we need to find roadmaps to work together
08:06:33 <fsasaki> .. multilinguality is a key part of this
08:06:57 <fsasaki> now introduction by richard ishida
08:06:57 <omstefanov> maximizing impact of our efforts <- key final point of Prof. Wade's talk!  very important !
08:07:10 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
08:07:26 <fsasaki> richard: 5th of mlw workshop, very happy to see it taking place here
08:07:40 <fsasaki> .. we run the MLW project with help of EC as a project, 
08:08:11 <fsasaki> .. idea was to bring people from different disciplines together, so that they talk, it worked very well
08:08:34 <Jirka> Jirka has joined #mlwDub
08:08:47 <fsasaki> .. during this workshop we will be more focused, but later we will go back to the general MLW workshop type again
08:09:31 <fsasaki> .. 12 years ago Yves Savourel and I started talking about internationalization and localization of schemas, that led to ITS standard; great to see where we came so far
08:09:40 <fsasaki> .. thanks a lot and have a good meeting
08:09:45 <fsasaki> intro by dave lewis
08:09:55 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
08:10:06 <fsasaki> dave: multilingualism and web content is crucial for many businesses:
08:10:20 <fsasaki> .. media providers, richer video and audio, content providers like microsoft
08:10:25 <omstefanov> David Lewis: success of  mlw workshops: getting people together who might not otherwise have met and worked together.
08:10:25 <fsasaki> .. CMS providers, browsers, ...
08:10:39 <fsasaki> .. all need to be aware increasingly of multilingual content
08:10:57 <fsasaki> .. aim to get people into the room from industry and academy, and different parts of the topic
08:11:25 <fsasaki> .. language technology, localization, web people, with w3c as a core place where people meet 
08:11:31 <fsasaki> .. and where we advance things to standards
08:11:41 <fsasaki> .. another key player from a European perspective is the EC
08:11:58 <omstefanov> David Lewis: core of multilingual issues is W3C ... to advance standards. Other core player is the EC which provides ongoing support.
08:11:59 <fsasaki> .. coordination activities have an important role
08:12:12 <fsasaki> .. important both for research and infrastructure support
08:12:24 <fsasaki> .. now fp7 is ending, looking forward to horizion 2020
08:12:37 <fsasaki> .. has many opportunities to bring things together
08:12:44 <omstefanov> EC's Framework 7 will be followed by Horizon 2020 (not next Framework)
08:13:24 <fsasaki> dave: today please thing mostly about bridge building: between various disciplines, industry and research
08:13:50 <fsasaki> .. and esp. the two themes: MLW (HTML5, going into language services industry) and linked (open) data
08:14:18 <fsasaki> .. at the end of the day we have a couple of questions to lay out a roadmap, so keep these questions in mind
08:14:22 <fsasaki> intro from kimmo
08:14:26 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
08:14:47 <fsasaki> kimmo: project officer of the series of workshops - MLW project and MLW-LT project
08:15:22 <fsasaki> .. very grateful to Richard for bringing this community to where we are, 
08:15:36 <fsasaki> .. a few words about our internal re-organization
08:15:52 <fsasaki> .. "my" dg will no be called "dg connect"
08:16:18 <fsasaki> .. in three weeks three units will be merged, the "data value chain" unit
08:16:41 <fsasaki> .. there units were LT, data, and PSI (previously E 1,2,4)
08:16:56 <fsasaki> .. LT portfolio will continue to exist, but in a bigger context
08:17:32 <fsasaki> .. E.4 does not have projects; so the new G3 unit will handle the E1 and E2 projects
08:17:50 <fsasaki> .. new activity of our unit: we will handle policy and legislative issues of public data
08:17:58 <fsasaki> .. so-called PSI "open public governmental data"
08:18:18 <fsasaki> .. also, our unit will handle two infrastructures of connecting europe facility
08:18:42 <Yves_> Yves_ has joined #mlwDub
08:18:53 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
08:19:19 <fsasaki> kimmo: LT community needs to see how they can leverage linked data, SW, big data ...
08:19:30 <fsasaki> .. these are just keywords, here are just some threads:
08:19:45 <fsasaki> .. extracting meaning from text, converting structured data from unstructured data
08:19:57 <fsasaki> .. hard task, but can move forward by bringing LT and data community
08:20:27 <fsasaki> .. a lot of useful work on terminology, ontologies, taxonomies, nomenclatures etc.
08:20:55 <fsasaki> .. very happy that this workshop opens with the colorful speakers related to that area
08:21:14 <fsasaki> .. a few more words on CEF - is about infrastructure
08:21:24 <dF> dF has joined #mlwDub
08:22:08 <fsasaki> .. CEF concludes early designs for 8 infrastructures, e.g. Europeana, multilingual access etc.
08:22:16 <fsasaki> .. this is not research, but building systems
08:23:35 <omstefanov> about 78 meur will be in 3 calls to be published in July 2012
08:24:27 <omstefanov> Obj. 4.1 (27meur), 4.2 (31 meur) and 4.3 (20 meur). 
08:24:29 <fsasaki> kimmo: objectives - content analytics, and LT, scalable data analytics, SME initiatives on analytics
08:24:37 <omstefanov> 4.1 COntent analytics and lang tech
08:24:46 <omstefanov> 4.2 scalable data analytics
08:24:56 <omstefanov> 4.3 SME initiative on analytics
08:25:13 <fsasaki> .. our role is about extracting meaning from large types of language based information
08:25:22 <fsasaki> s/our/our (LT)/
08:25:58 <fsasaki> kimmo: I'm here the whole day, if you have questions please let me know
08:26:06 <omstefanov> Kimmo Rossi only here today. Invites everyone to come to see him today if interested / have ideas
08:26:09 <fsasaki> dave - now short self introduction
08:26:34 <mlefranc> mlefranc has joined #mlwDub
08:26:42 <nico> nico has joined #mlwDub
08:32:38 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
08:33:30 <fsasaki> dave: very interesting mix of people here
08:33:44 <fsasaki> .. very happy to have also XLIFF TC people here, who will have a meeting here later int the week
08:33:54 <fsasaki> .. so we have good expertise from OASIS on the localization side too
08:34:26 <tadej> tadej has joined #mlwDub
08:35:03 <fsasaki> dave checking - who is from industry, research, standardization
08:35:20 <fsasaki> when - who is LT or SW research side
08:35:45 <fsasaki> dave: have done a good job in bringing the people here
08:37:17 <fsasaki> arle: look at IRC - this is where we make the meeting minutes, but also to gather comments
08:38:12 <paulb> paulb has joined #mlwDub
08:38:14 <fsasaki> arle: crowdsourcing content creation - go and write your ideas on the board, people leading the workout sessions will make use of that
08:38:20 <ryan> ryan has joined #mlwDub
08:38:22 <Chris> Chris has joined #mlwDub
08:38:38 <ryan> ryan has left #mlwDub
08:39:53 <RyanHeart> RyanHeart has joined #mlwDub
08:40:14 <fsasaki> topic: setting the stage - presentation from david urban
08:40:51 <Arle> s/david urban/david orban/
08:41:12 <Rob> Rob has joined #mlwDub
08:41:36 <fsasaki> davidUrban: exponential trends:
08:41:51 <omstefanov> David Orban (dotSUB): challenge is understanding the power of exponential trends
08:42:08 <fsasaki> .. in the initial part of the exponential function, people say easily "this is just noise"
08:42:20 <fsasaki> .. famous example is human genome project
08:42:59 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
08:43:59 <fsasaki> davidU: I'm a geek, loving to observe the nature of machines
08:44:01 <omstefanov> nay sayers have an easy time, initially. It takes time to reach a point of visibility. 1% often takes "most" of the timeline. Then the exponential curve gets the rest done more quickly
08:44:30 <PhilA> PhilA has joined #mlwDub
08:45:01 <omstefanov> We've gone from mainframes thru several levels of devices, to reach the final generation of human-oriented devices.
08:45:35 <omstefanov> The latest, called mobile phones/devices number in the hundreds of millions
08:45:48 <omstefanov> the text generation will outnumber the number of people
08:45:55 <omstefanov> the next...
08:46:18 <fsasaki> davidU: communication among "things" in the internet, talking to humans in emergency or other situations
08:46:32 <fsasaki> .. these devices are going to dominate future computing unverse
08:46:43 <omstefanov> in this "next age" the age of automonomour machines, will communicate more with each other than with humans
08:47:11 <fsasaki> davidU: automonous devices already exist, they communicate with each other and us
08:47:32 <omstefanov> eg. of autonomous devices: iRobot - autonomous vacuum cleaner
08:47:43 <fsasaki> .. what are the communication signifiers that enable us to operate a vacuum cleaner or a mobile phone?
08:48:03 <fsasaki> .. what are the decisions of autonomous cars
08:48:09 <fsasaki> .. these are fundamental challenges
08:48:34 <fsasaki> .. all developments are very chaotic  - standards settings and policy making are essential for this
08:48:55 <omstefanov> in terms of policy making decisions are very chaotic ... industry wants to thrust ahead.
08:49:25 <fsasaki> .. consumers jump on board - it is important to balance advantages of new technologies with potential pitfalls
08:49:30 <omstefanov> users want the devices, without thinking too much about consequences
08:49:45 <omstefanov> google glass as e.g. of augmented reality
08:49:46 <fsasaki> .. hard for policy makers to keep up
08:50:19 <omstefanov> defacto industry standards will be faster to develop than those standards that standards bodies develop.
08:50:31 <fsasaki> .. technology developments - augmented reality interfaces, google glasses etc.
08:50:32 <omstefanov> former may influence latter
08:51:58 <omstefanov> "Code is Law" Lawrence Lessig.
08:52:07 <fsasaki> davidO: need of human societies to interact in a positive way - not in a relationship of winners and loosers
08:52:38 <omstefanov> ... from CODE and other laws of cyberspace (http://code-is-law.org)
08:52:59 <fsasaki> .. semantic web can accelerate understanding of this
08:53:20 <fsasaki> .. many have seen results that google exposes - creating wikipedia like pages from accumulated search results
08:53:42 <fsasaki> .. bring human component of wikipedia like large scale cooperation together with semantic web processing
08:54:00 <fsasaki> .. we are creating an interoperable hybrid very powerful system
08:54:31 <omstefanov> Human-computer interoperability is coming ! Wikipedia-like pages created using semantic web tools from Google-like data
08:54:48 <fsasaki> .. are are creating the premises of new types of social interactions, that can be abstracted to a political level
08:55:17 <omstefanov> DavidO: recommends the "Proactionary Princile"
08:55:51 <omstefanov> Developing a balanced approach to decision making. 
08:56:34 <fsasaki> topic: presentation from peter schmitz
08:56:48 <fsasaki> peter: building a new architecture in the cellar project
08:56:58 <fsasaki> .. deal also with metadata standardization, format standardization 
08:57:17 <fsasaki> .. esp. in legal domain there are many XML based structures
08:58:43 <fsasaki> .. re-use policy of the EC
08:59:08 <fsasaki> .. purpose is to increase efficiency in EC
08:59:17 <fsasaki> .. currently developing an open data license
08:59:31 <fsasaki> .. publications office: publisher of EU
08:59:55 <fsasaki> .. EC, european parliament, other institutions
09:00:06 <fsasaki> .. publishing in 23 languages
09:00:35 <fsasaki> .. man public online services - eur-lex, eu bookshop, public procurement, r&d on cordis
09:00:47 <fsasaki> .. position of publication office on re-use and SW / linked data:
09:01:14 <fsasaki> .. we are part of EC, so we are part of the execution of this initiative 
09:01:26 <fsasaki> .. we have a re-use policy led by the dginfos / future dgconnect
09:01:59 <fsasaki> .. in about autumn first version of european data portal will be online
09:02:13 <fsasaki> .. EU level: standardization participation esp. in the legal domain
09:02:37 <fsasaki> .. topics / ideas of re-use of language resources in NLP domain
09:03:32 <fsasaki> .. our contributions: multilingual thesaurus like euroVoc, multilingual controlled vocabularies and taxonomies ("common authority tables"), linked multilingual XHTML content (official journal, case law)
09:03:58 <heididp> heididp has joined #mlwDub
09:03:58 <fsasaki> .. all these will be provided for re-use in new dissemination architecture
09:04:11 <fsasaki> .. legal content started in SGML, now XML, being converted into XHTML
09:04:44 <fsasaki> .. content delivery infrastructure for linked open multilingual data
09:05:02 <fsasaki> .. will provide storage, dissemination, content, provision of persistent URIs
09:05:13 <fsasaki> .. prefix: http://publications.europa.eu
09:05:27 <fsasaki> .. support and encourage data providers to provide RDF
09:05:48 <fsasaki> .. visualisation tools based on RDF
09:06:07 <fsasaki> .. encourage colleagues to provide their data in RDF too 
09:06:15 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
09:07:00 <fsasaki> peter: for open data portal: possibility to contribute ideas, data cataloging ...
09:07:22 <fsasaki> peter: crowd-sourced annotation and adaption of LOD: a question for us
09:07:40 <fsasaki> .. we have annotation of official content, but there might be use cases for crowd-sourced annotations
09:07:54 <fsasaki> .. but need to define quality support in this
09:08:08 <fsasaki> .. provenance tracking, history and storage is important
09:08:27 <fsasaki> .. LOD and authenticity - will there be an organization for LOD?
09:08:38 <fsasaki> .. how to implement this concept, how to approve it?
09:08:59 <fsasaki> .. for thesaurus we have a release mgmt, for controlled vocab we trace histories
09:09:10 <fsasaki> .. existing codes will remain in vocabulary + time spans
09:09:35 <fsasaki> .. further application domains for MLW - LOD in eGov, health:
09:09:49 <r12a> r12a has joined #mlwdub
09:09:52 <fsasaki> .. we provide stable and persistence URIs for data
09:10:03 <fsasaki> .. would like to discuss: what about authorized relationsships
09:10:31 <fsasaki> .. e.g. a journal is published in 23 languages - what is the authority relation here?
09:11:03 <fsasaki> .. about LOD from the public side - provide a European Legislation Identifier (ELI)
09:11:09 <Pedro> Pedro has joined #mlwDub
09:11:21 <fsasaki> .. part of standardization of PSI
09:11:45 <fsasaki> .. example: http://eurlex.europa.eu/eli/dir/2008/98
09:12:00 <fsasaki> .. will allow to align legislation across Europe
09:12:37 <fsasaki> peter: LOD and it's role in MLW-LT metadata
09:12:59 <fsasaki> peter: integration of LOD through MLW-LT metadata
09:13:25 <fsasaki> .. references from web content items, e.g. entries in multilingual thesauri or authority tables etc.
09:13:48 <fsasaki> .. enrichment of web content with HQ information, to improve MT, localization workflows etc.
09:14:13 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
09:14:38 <fsasaki> topic: presentation from Europeana (Juliane Stiller + Marlies Olensky)
09:15:18 <fsasaki> juliane: working on europeana 2 - multilingual access of europeana content and europeana data layer
09:15:35 <fsasaki> .. europeana facts: launched 2008, cultural heritage information system
09:15:44 <fsasaki> .. data from archives, audio visual archives, libraries
09:16:09 <fsasaki> .. build a digital library as a single access point, today access to about 23 mill objects
09:16:27 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
09:16:54 <fsasaki> .. map of Europe - main content is coming from a few European countries
09:17:12 <fsasaki> .. e.g. if metadata comes from France, metadata is in french, but object might be from a different language
09:18:06 <fsasaki> .. how does multilingual access and search work on Europeana? involves interface, 
09:18:14 <fsasaki> .. search (query translation and document translation)
09:18:31 <fsasaki> .. result presentation (enable users to assess relevance of results)
09:18:42 <fsasaki> .. and browsing, important for cultural heritage domain
09:19:01 <fsasaki> .. people need to be able to "find the unknown"
09:19:23 <fsasaki> .. Europeana: static interface is translated into 26 different languages
09:19:40 <fsasaki> .. query translation prototype developed for 10 European languages
09:19:59 <fsasaki> .. a document can be translated after being found, via MS translation API
09:20:06 <fsasaki> .. now work on semantic data layer
09:20:21 <fsasaki> .. multilingual alignment of controlled vocabularies etc.
09:21:10 <fsasaki> marlies: edm comprises cross-domain metadata - library, archive, museum
09:21:36 <fsasaki> .. edm is a roof for different domains and levels of granularity 
09:22:05 <fsasaki> .. europeana is a cross-domain framework, using SKOS, CIDOC-CRM etc.
09:22:15 <fsasaki> .. can use then specific parts e.g. from museum domain
09:22:39 <fsasaki> .. basic distinction: "provided item" vs. its digital representation, plus metadata record
09:22:47 <fsasaki> .. allow for multiple records of one object
09:23:08 <fsasaki> .. composition of objects, important for library and archives domain
09:23:17 <fsasaki> .. can re-present contextual resources
09:23:29 <fsasaki> .. a metadata format that can be specialized
09:24:36 <fsasaki> .. edm case studies, see http://pro.europeana.eu/web/guest/case-studies-edm
09:25:02 <fsasaki> .. LOD pilot with 2.4 mill objects, contributions from spain, norway, austria, sweden, belgium
09:25:40 <fsasaki> .. places or items are mapped to places 
09:25:46 <fsasaki> .. multilinguality and EDM:
09:26:04 <fsasaki> .. semantic data is multilingual, see data cloud developed in europeana connect project
09:26:15 <fsasaki> .. different vocabularies are aligned with each other
09:26:36 <fsasaki> .. not only multilingual vocabularies will allow for multilingual search results
09:26:54 <fsasaki> .. we also align monolingual results by a pivot vocabulary 
09:27:47 <fsasaki> .. language tags play a role in the edm too - labels in different languages
09:28:23 <fsasaki> .. now example how Europeana portal deals with multilinguality
09:28:51 <fsasaki> juliane: example search for "cheval" - result list, facets, filtered by language
09:29:09 <fsasaki> .. metadata is in czech
09:29:47 <fsasaki> .. cheval is not in the metadata - result was found because the metadata fields where enriched with different language versions including a thesaurus with the term "cheval"
09:30:09 <fsasaki> .. with multilingual enrichment of vocabularies you can enhance multilingual search
09:30:24 <fsasaki> .. we are now working on how to present this to the user
09:30:30 <fsasaki> .. summary: europeana is very multilingual
09:30:43 <fsasaki> .. multilingual "metadata + object + user"
09:30:57 <fsasaki> .. hard to retrieve objects but also to present to a multilingual audience
09:31:27 <fsasaki> topic: "setting the stage" - QA
09:31:32 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
09:32:37 <fsasaki> daveL: question on DavidO - easy multilinguality on the web - will that be a source of a big change?
09:32:40 <fsasaki> davidO: yes
09:33:04 <fsasaki> .. MT has been very active, statistic approaches created a new generation of MT
09:33:14 <fsasaki> .. now statistical MT has reached maximum potential
09:33:25 <fsasaki> .. now there is an opportunity to apply further techniques
09:33:46 <fsasaki> .. not only for translation of a piece of content, but also for akquisition of content
09:33:49 <fsasaki> .. e.g. for training
09:34:11 <fsasaki> .. additional techniques to differentate between speakers, understand what not to try to transcribe
09:35:01 <fsasaki> paul: talking about translation and multilinguality, in sense of terms / multilingual alignment / multilingual resources
09:35:31 <fsasaki> .. what is the role of e.g. LR? what role does NLP play in Europeana?
09:35:44 <fsasaki> .. e.g. beyond terms: lexical resources, morphological information
09:36:03 <fsasaki> juliane: very important, in Europeana connect we built language resources
09:36:20 <fsasaki> .. we were looking for resources to implement cross-lingual search
09:36:38 <fsasaki> nicoletta: we should discuss very carefully about the role of LR with respect to MLW, SW
09:36:50 <fsasaki> .. there are so many dimensions one should touch
09:37:16 <fsasaki> .. at the moment (in MLW-LT) we discuss metadata - "content" in Europeana is another level that touches lexicon etc.
09:37:25 <fsasaki> .. there is also big data in our field
09:37:53 <fsasaki> .. so there are so many dimensions - the role of LR in big data environment needs to be discussed, including policy issues
09:38:16 <fsasaki> paul: policy issues in terms of standardization in the EU context are important too 
09:38:36 <fsasaki> .. how do we deal with standardization, making linked data (content, lexicon / linguistic) "official"
09:39:23 <fsasaki> kimmo: very important that we don't kill emerging new activity by standardizing
09:39:46 <fsasaki> .. previously things have been killed by standardization
09:40:12 <fsasaki> .. our practical approach in MLW-LT and other projects has been: we impose standardization by example, not by conditions and rules
09:40:26 <Jirka_> Jirka_ has joined #mlwDub
09:40:41 <fsasaki> .. has the disadvantage that it is slightly chaotic, but would still speak in favour of involving people in doing something
09:40:59 <fsasaki> .. e.g. EU publications office should be the lead in standardizing their work
09:41:35 <fsasaki> .. that might become a part of a standard or not, and need to link this to standards work
09:42:23 <fsasaki> philArcher: talked to EDM people that the EDM could become a part of W3C standardization, also have a lot of questions to peter, will take that offline
09:42:59 <fsasaki> thierry: also important to see if we need other standards like LMF for lexicons etc.
09:43:48 <RyanHeart> Question: what is the one take-away from this session?
09:43:52 <fsasaki> arle: question for davidO - how to build the bridge from current efforts we saw so far to your vision?
09:44:44 <fsasaki> davidO: competition will drive adoption of models
09:45:06 <fsasaki> .. this is just a start of a conversation - Europeana is a wonderful initiative
09:45:23 <fsasaki> .. would love to see it to evolve to opportunities that are commercial as well
09:45:44 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
09:45:52 <gderiard> gderiard has joined #mlwDub
09:46:03 <fsasaki> alexanderLik: why not use regulation as a vehicle for industries to implement things
09:46:43 <fsasaki> Ioannis: who will pay for it?
09:46:49 <fsasaki> .. things need to be marked driven
09:47:01 <fsasaki> .. if there is no carrot nothing will happen
09:48:02 <fsasaki> olaf: to all the speakers - what is the role of the crowdsource for metadata gathering, metadata definition etc.
09:48:05 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
09:49:23 <fsasaki> juliane: doing that in Europeana to some extend - looking into user logs etc.
09:49:48 <fsasaki> now break
09:49:49 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
10:00:42 <fsasaki> before break about "qero" project (link to be provided later)
10:17:19 <Arle> Dave Lewis: This section is about linking data.
10:17:24 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
10:17:28 <Yves_> Yves_ has joined #mlwDub
10:17:47 <fsasaki> topic: presentation from Sebastian Hellmann
10:17:57 <Arle> Sebastian Hellman: researcher at Uni Leipzig. Will be talking about linked data for NLP and Web annotation. This is a broad topic. I will point to projects as an overview.
10:18:19 <fsasaki> scribe: Arle
10:18:47 <Arle> ..Motivational slide: lots of walled gardens. This is the way it was before RDF. There are many beautiful gardens, but you can't go between them. I want to talk about turning walled gardens into networks of parks.
10:19:24 <Arle> ..How do we leverage linked data for NLP? They cover many domains. The data is crowdsourced. This is the background.
10:19:33 <Arle> RDF is about semantic interoperability.
10:19:49 <Arle> Third factor is making the output of NLP available on the web.
10:20:19 <Arle> Slide shows huge number of Linked Open Data repositories. Currently linguistic data is one part under cross-domain.
10:20:56 <omstefanov> omstefanov has joined #mlwDub
10:21:39 <Rob> Rob has joined #mlwDub
10:22:15 <Arle> ..Linguistic LInked Open Data Cloud: Linkes many areas. How do you fund this? Difficult to fund any one institution. There is a time horizon on funding: may lead to death of projects.
10:22:30 <Arle> ..Funding for cloud remains difficult
10:22:36 <Arle> s/LInked/Linked?
10:22:42 <Arle> s/LInked/Linked/
10:23:14 <Arle> DBPedia includes eight interlinked language versions. Individual language data is available.
10:23:27 <r12a> r12a has joined #mlwdub
10:23:29 <Arle> s/DBPedia/..DBPedia/
10:24:25 <Arle> ..Wiktionary2RDF: Communities create wrappers (made by domain experts).Converted to Lemon via Mediator. Anyone can join the community:http://dbpedia.org/Wiktionary
10:25:07 <Yves_> Yves_ has joined #mlwDub
10:25:54 <Arle> ,,Web Technologies for integrating NLP tools and approaches. Once you are immersed in a technology, you don't see other solutions and start trying to apply it where it doesn't apply. There are cases where RDF makes sense, but others where relational databases make sense. Learn *when* to use it.
10:26:28 <Arle> ..My solution: RDF allows linking between walled gardens. It has certain properties other data models do not provide.
10:26:35 <Arle> s/,,Web/..Web/
10:26:38 <omstefanov> Says RDL is the way to link up different gardens, but not why
10:27:34 <Arle> ..Advantaged: URIs available, formal documentation (like UML), easy-to-understand structure, many tools (e.g., LOD2 Stack), indexing and querying allow big picture.
10:28:17 <Arle> ..NLP Interchange Format (NIF) aims at interoperability between NLP, language resources, and annotations.
10:28:40 <Arle> ..First released September 2011. Open project. Growing with feedback.
10:29:01 <Arle> ..NIP allows interlinking between various tools (slide show structure and tools).
10:29:58 <Arle> ..No current standard mechanism to connect WWW, Giant Global Graph (GGG), and NLP. There is no way to combine the three.
10:31:02 <Arle> ..Want to allow annotation by various tools. Also human annotation (links, free text, correction of NLP annotations)
10:31:39 <Arle> ..But all this does not work together. It has the walled garden problem still. Semantic Web is supposed to fix this, but a lot of work remains.
10:32:10 <Arle> ..Showed example of how to make it work.
10:32:23 <Arle> ..Feel free to join.
10:32:56 <Arle> topic: presentation from Dominic Jones
10:34:00 <Arle> Dominic Jones: Want to start with an info graphic to show the world by nationality. Want to add to it traditional print media and also user-generated content (on electronic devices). These types of content are very different. What we produce is somewhere in the middle.
10:34:22 <Arle> ..Issue is the challenge of how to localize this stuff.
10:35:14 <Arle> ..Compare Flickr, Reddit, etc. Raises issues: provenance, access control (linked *open* data vs. linked data—this may be a blocking issue)
10:36:14 <Arle> ..Architecture based on CMS Lion, uses XLIFF messaging between various components. What we add is an RDF model of translation, provenance, CNGL service and content models.
10:36:51 <Arle> ..These models represent data we deal with.
10:37:42 <Arle> ..Book of Kells is here at Trinity, written on calf skin, in a big glass case. You can't access it yourself, relies on gatekeepers. Compare it to the iPad, where the consumer becomes the producer.
10:38:37 <Arle> ..CMS-LION emphasizes user-generated content. Compare to "telephone" game: our system lets us know who changes what in the translation, what happens in QA, what is consensus, etc.
10:40:04 <Arle> ..Will show an example of a tweak. We break it into the content model, and into a job. Provenance is critical in working with things. We use a lightweight version of the Open Provenance Model.
10:41:40 <Arle> ..We have Artifacts, Processes, and Agents. Able to map process diagram to these things. This process allows us to enrich the translation model with information on who did things, and how.
10:42:24 <Arle> ..We are integrating CMS LION with Panacea. Focus on post-edits to retrain MT systems. Also tying with LSP (VistaTEC).
10:42:47 <nico> nico has joined #mlwDub
10:43:10 <Arle> ..Will use CMS-LION as a test-bed for ITS 2.0 and tie it in with Solas (Limerick test bed for workflow orchestration).
10:44:13 <Arle> ..Now we are at the intersection of Multilingual Semantic Web, Language Resources, and Localization. This is MLW LOD.
10:44:54 <fsasaki> topic: presentation by Jose Emilio Labra Gayo
10:44:57 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
10:45:48 <Arle> ..My work is in multilingual LOD. We translated product schemes and procurement vocabulary in EC projects.
10:46:46 <Arle> Jose Labra: We have HTML, human readable, but how do we move from that to machine-readable that is intrinsically multilingual.
10:47:02 <Arle> s/..My work/Jose Labra:My work/
10:48:29 <Arle> ..There is data and there is *multilingual* data. We need to account for human readable information (e.g. "professor" vs. "catedrático". Moving from this to machine-readable is a challenge.
10:49:12 <Arle> ..Want to talk about best practices impacted by multilinguality. Have 8 best practices.
10:49:41 <Arle> ..1. Design a good URI scheme. Cool URIs don't change, identify things, are human-readable.
10:49:43 <PhilA> Note to self - Gov Linked data WG Best Practices on Linked Data needs to include section on multilingualism - ref. Juan's talk
10:50:24 <Arle> ..e.g. dbpedia.org/resource/Spain is good.
10:51:09 <Arle> ..I'm not sure if internationalized URIs are good or not. Can create problems with phishing, limited support, and human readability across languages.
10:52:38 <Arle> ..2. Model resources, not labels. URIs should map to contents, not to particular labels. We don't want to map to different language labels. Use universal pointer with RDF labels to language specific versions. But can cause problems with thesauri. SKOS uses URI-identifiable labels.
10:53:16 <Arle> ..Question: What happens if we want to use localized URIs. Perhaps using language identifiers in the URI is good, but I don't know.
10:53:49 <Arle> 3. Use human-readable information. Machine information can also be human readable.
10:54:17 <Arle> ..Question is how to balance between human readable and RDF world.
10:55:13 <Arle> ..4. Use labels for all the entities you model, not just concepts, not just main entities. Displaying labels is easier if you don't have to make multiple requests.
10:56:12 <Arle> ..Problems: Selecting the proper label. Only 38% of non-information resources have labels. Also, avoid camel case or similar notations in labels. "UniversityOfOvieda" is a bad label.
10:58:07 <Arle> ..5. Use multilingual literals. IETF lets you select the right tag. But multilingual literals can create problems. The right technology can deliver less than ideal results. E.g., SPARKL works with labels, what happens when you use a language-bound query (e.g., for "Professor" without a language tag). Need to create a default label with no language tag.
10:58:44 <Arle> ..This is currently unused (only 4.78% of info-resources use a language tag, and only 0.7% use more than one.)
10:59:44 <Arle> ..We need to balance between RDF and XML and be aware of consequences of mixing. This is a challenge.
11:00:36 <Arle> ..6. Use content negotiation. Use Accept-Language. Without it we end up returning too much data. Allows you to get labels in the language you want.
11:01:07 <js> js has joined #mlwDub
11:02:33 <Arle> ..7. include labels without a language tag. This makes it easier for SPARQL queries. Need to know what the default language is. Is there a way to declare the primary language of an RDF data set.
11:03:35 <Arle> ..8. Use multilingual vocabularies. Claimed that they should include descriptions in more than one language, but most do not. Also what to do when not localized?
11:04:10 <Arle> ..Raises issues of when categories don't map precisely across languages.
11:04:50 <Arle> ..Some other issues: Unicode support. Microdata doesn't allow language declarations. Internationalization not covered in RDF.
11:04:59 <Arle> ..LOD offers new challenges.
11:05:13 <gderiard> gderiard has joined #mlwDub
11:05:40 <Arle> topic:Question and Answer
11:06:50 <Arle> Question: José, you recommended using literals without language tags at all. But what happens when the literal can mean different things in different languages? E.g., Gift in English is very different than Gift in German.
11:07:26 <Arle> José Labra: These are difficult issues. There are practices to model lexicons and separate concepts from labels.
11:08:36 <Arle> Sebastian Hellman: The URIs are not human readable if you do not use IRIs. But then you have to use % encoding which is impossible to read. In DBPedia we use IRIs. We think libraries should support IRIs.
11:09:06 <Arle> Maxime Lefrançois: I have a question for Dominic. Do you use the W3C Provenance concepts?
11:09:34 <Arle> Dominic Jones: The Open Provenance Model predates the W3C work and gave rise to it, but we chose it as an off-the-shelf solution.
11:10:24 <Arle> Thierry Declerck: Question about terminology (did not catch it)
11:11:23 <Arle> Tadej Štajner: For Sebastian. I've been following the NLP to RDF work. Is there any work on encoding this in an inline format directly in a document. Some of our use cases require this rather than storing them separately.
11:12:08 <Arle> Sebastian: It might be possible, but it is difficult in general. It is hard for any annotation format. Maybe easier with RDFa. We'll have to discuss more.
11:13:14 <Arle> Pedro Diez: Maybe we need to make a distinction between the kinds of data we are trying to link. We need a map without ambiguity to link linguistic data and general lexicons. Right now it is different to link to concepts with different literals across languages.
11:14:36 <Arle> ..Regarding this, maybe we need to make distinctions between different kinds of data: brands, names, telephone numbers, words. Most work we can reuse are lexical databases. They represent hard work.
11:15:15 <Arle> Maxime Lefrançois: In the Linked Open Data Cloud, is there work for linguistic. What are the links between ???
11:15:47 <Arle> Sebastian: It is a draft right now, but for the Linguistic Linked Open Data Cloud, it is more of a vision.
11:16:05 <Arle> ..We hope the original LOD Cloud will get more interactive over time.
11:28:08 <SDL_BDM_IRL> SDL_BDM_IRL has joined #mlwDub
11:29:34 <Test> Test has joined #mlwDub
11:44:22 <teddy> teddy has joined #mlwDub
11:44:57 <teddy> teddy has left #mlwDub
12:05:13 <n> n has joined #mlwDub
12:13:47 <Ralph> Ralph has joined #mlwDub
12:15:35 <Jirka_> scribe: Jirka
12:16:56 <RRSAgent> RRSAgent has joined #mlwDub
12:16:56 <RRSAgent> logging to http://www.w3.org/2012/06/11-mlwDub-irc
12:17:04 <Jirka_> topic: Linked Open Data and the Lexicon session chaired by Arle Romel
12:17:17 <Jirka_> scribe: Jirka_
12:17:45 <Jirka_> rrsagent, draft minutes
12:17:45 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html Jirka_
12:18:13 <leroy> leroy has joined #mlwDub
12:18:39 <Ralph> Ralph has left #mlwDub
12:19:11 <Jirka_> topic: Bringing Terminology to Linked Data through TBX by Alan Melby
12:19:21 <Yves_> Yves_ has joined #mlwDub
12:20:03 <Rob> Rob has joined #mlwDub
12:20:05 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
12:21:01 <Jirka_> Alan: one of objectives of workshop is to bridge gap between traditional terminology and LOD/LD efforts
12:21:58 <Jirka_> ... TBX/RDF
12:22:36 <Jirka_> ... TBX is TernBase eXchange standard
12:22:38 <omstefanov> omstefanov has joined #mlwDub
12:23:18 <Jirka_> ... not single language, family of languages called dialects
12:24:02 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
12:24:09 <philr> philr has joined #mlwdub
12:25:48 <Jirka_> ... TBX/RDF is isomorphic mapping of TBX to RDF
12:26:03 <Jirka_> ... alows loseless bidirectional conversion
12:26:22 <Jirka_> ... why to have TBX/RDF?
12:26:39 <Jirka_> ... there is a lot of knowledge in TBX, it should be part of LD
12:27:13 <Jirka_> ... LD can benefit from term disambiguation
12:27:35 <Jirka_> ... access to well-established knowledge engineering community
12:27:59 <Jirka_> ... provides concept-based information for translation
12:28:19 <Nathan_Rasmussen> Terminology industry uses TBX for interchange, but
12:28:47 <Nathan_Rasmussen> it is not well suited to an online, open style of data exchange like RDF is
12:29:10 <Nathan_Rasmussen> thus this project is for terminologists to benefit from LD as well as the other way around.
12:29:24 <Jirka_> ... URIs for datacategories are in www.isocat.org
12:29:54 <Jirka_> ... TBX/RDF uses XML+RDFa1.1
12:32:49 <Jirka_> topic: Managing Director, Interverbum Technology by Ioannis Iakovidis
12:33:35 <Arle> Arle has joined #mlwDub
12:33:48 <Jirka_> Ioannis: Our company develops TermWeb - terminology management system
12:35:24 <Trant> Trant has joined #mlwDub
12:37:50 <Jirka_> Ioannis describes what tool can do and how complex terminology workflow could be
12:39:00 <Jirka_> ... terminology is everywhere
12:39:18 <paulb> paulb has joined #mlwdub
12:40:01 <r12a> r12a has joined #mlwdub
12:40:14 <r12a> r12a has left #mlwdub
12:40:16 <Jirka_> ... challanges - integration with another tools (no standard API)
12:40:24 <r12a> r12a has joined #mlwdub
12:40:27 <Jirka_> ... term identification and tagging
12:41:38 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
12:47:56 <Jirka_> ... some standard/convention is highly desirable
12:48:32 <Jirka_> topic: Extending the Use of Web-Based Terminology Services by Tatiana Gornostay
12:49:37 <mlefranc> mlefranc has joined #mlwdub
12:50:36 <Jirka_> Tatiana is from TILDE company, providing localization, translation, ... mainly for LV, LT, ET and RU
12:50:59 <Jirka_> ... they have to deal with terminology every day
12:51:46 <Jirka_> ... efficient communication requires terminology
12:54:06 <Jirka_> ... terminology is bridging language and semantic technologies
12:54:40 <Jirka_> ... eurotermbank.eu - 2nd largest term. database in Europe
12:57:38 <Jirka_> ... describes Accurat & TTC tools/projectd
12:57:45 <Jirka_> s/projectd/projects/
12:58:07 <Jirka_> ... TaaS - Terminology as a service
13:00:09 <Yves_> s/challanges/challenges/
13:01:46 <Jirka_> ... terminology can enhance automation in LOD
13:02:25 <Jirka_> ... terminology helps in automating work with multi/cross-lingual metadata
13:02:45 <nico> nico has joined #mlwDub
13:03:09 <Jirka_> topic: The Need for Lexicalization of Linked Data by John McCrae
13:08:13 <Jirka_> ... PYTHIA - ontology based question answering system
13:08:17 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
13:09:36 <Jirka_> ... proper rdfs:label is only on about 2% of content
13:11:05 <Jirka_> ... labels are very amiguious
13:12:16 <labra> labra has joined #mlwdub
13:13:25 <Jirka_> ... created lexicon model relative to ontolgies
13:13:40 <Jirka_> ... built on ISO 24613 and SKOS
13:14:12 <dF> dF has joined #mlwDub
13:14:43 <Sebastian> Sebastian has joined #mlwDub
13:14:59 <Sebastian> Sebastian has left #mlwDub
13:14:59 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
13:15:36 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
13:15:36 <Jirka_> ... further development under W3C Ontolex CG
13:18:29 <Jirka_> topic: Cool URIs Are Human Readable by Phil Archer
13:19:32 <Jirka_> ... ISA - Interoperability Solutions for European Public Administrations
13:20:46 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
13:22:58 <Jirka_> Phil describes what interoperability means in terms of term
13:23:23 <Jirka_> ... unique identifier of term is very important for interop
13:25:21 <Jirka_> ... domain names used in URI should be neutral
13:29:57 <Jirka_> ... providing equivalent localized URIs is friendly for users but adds more work for publishers and needs more power for processing
13:30:42 <Jirka_> topic: Q&A
13:30:58 <Jirka_> rrsagent, draft minutes
13:30:58 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html Jirka_
13:33:18 <Jirka_> Pedro: Dejavu - we had similar problem 18yrs ago.
13:35:19 <Jirka_> Tatiana: IMHO natural language can't be formalized. ...
13:35:22 <Jirka_> Kimmo
13:36:15 <Jirka_> Kimmo: EU commision tries to do this for years. Fomalization of concepts has limits.
13:37:16 <Jirka_> Felix: 18yrs ago there were no Web.
13:38:44 <Jirka_> Arle: Hopefully we are at inflection point now and we will see rapid progress.
13:40:42 <Jirka_> Kerstin: Some terms are translated only in some languages.
13:41:25 <Jirka_> Alexander: What to translate and what no to can be hold in metadata.
13:43:44 <mhellwig> mhellwig has joined #mlwdub
13:43:44 <Jirka_> Tadej: RDF is unable to express that something is translation.
13:46:37 <Marion_Shaw> Marion_Shaw has joined #mlwDub
13:46:39 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
13:48:32 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
13:48:40 <PMacAree> PMacAree has joined #mlwDub
13:49:50 <tadej> scribe: tadej
13:50:06 <tadej> topic: Identifying Users and Use Cases - Matching Data to Users
13:50:23 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
13:51:52 <tadej> DaveL: After an interesting discussion, it's time to draw out concrete use csaes
13:51:58 <tadej> s/csaes/cases
13:52:00 <tadej> s/csaes/cases/
13:53:40 <tadej> Thierry: Can you give us short descriptions of the use cases for the technologies and standards describe today. What are the business cases, how can language LOD improve business?
13:54:56 <tadej> PeterS: There are several in the legal domain and public services.
13:55:10 <tadej> ... From the transparency side, equivalence of languages is important. 
13:55:56 <tadej> ... For the open data, we have to come back to this
13:56:15 <tadej> Thierry: Is there a concrete requirement for one of those use cases?
13:57:14 <tadej> PeterS: There were interactions, a lot of feedback comes from institutions that use the technology, as well as from the EU member states, but more on the formal side. There is less informal feedback from the community. 
13:57:17 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
13:58:34 <tadej> Julianne: For europeana, we are gathering feedback from not just librarians in the community, but also wider public via the Europeana Connect initiative.
13:59:13 <tadej> ... In terms of multilingual uses, we did access log analyses and we indeed have multilingual users, but majority are monolingual which need to be able to use it. 
13:59:38 <tadej> Thierry: How was the access implemented? SPARQL?
14:00:10 <tadej> Julianne: For now, the interface is via downloading a dump, the rest is under implementation. A multilingual interface is difficult to implement. 
14:00:51 <tadej> ... With regard to users, there are several groups in terms of requirements: some come from education, professionals and general public.
14:01:42 <tadej> fsasaki: Are users from the localization/internationalization area a new use case for you?
14:02:51 <tadej> Julianne: We are aiming to have cross-lingual descriptions for our resources also for othe use cases.
14:03:57 <tadej> PeterS: We are at the end of the chain, the data gets generated upstream.
14:04:36 <dgroves> dgroves has joined #mlwdub
14:06:27 <dgroves> dgroves has left #mlwdub
14:06:47 <dgroves> dgroves has joined #mlwDub
14:07:23 <tadej> HorstKraemer: We work with the german government on e-government projects. We started by using Drupal, and generally the open source CMSes are a big trend in this space.
14:07:45 <tadej> ... the LOD, inferences, semantic technologies still need a lot of acceptance management on the e-government side.
14:08:56 <tadej> philr: We are very interested in using RDF for integrating various info sources. We want to provide a full picture of the data, currently we do this via synchronizing the disparate sources, which is complicated.
14:08:57 <fsasaki> s/HorstKraemer/SebastianSklarß/
14:09:11 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
14:09:26 <tadej> ... We see opportunity in using RDF for this kind of data integration purposes.
14:10:21 <tadej> Pedro: We work in localisation. We have 2 scenarios where we need this metadata. 
14:11:04 <tadej> ... 1) real-time processing of metadata for normal translation workflows
14:11:19 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
14:11:46 <tadej> ... 2( real-time translation, which happens synchronously on page visit. Here, the client needs to be able to influence the process of serving RT translated content.
14:13:58 <tadej> TatianaG: At Tilde, terminology management is an invaluable source for automatic cross-lingual annotation. 
14:15:01 <tadej> AlanMelby: What is the status of expressing terminology in ITS1.0?
14:15:26 <tadej> Yves_: There is a termInfo* family of attributes which express that.
14:16:32 <tadej> AlanMelby: What about looking at meanings and not just strings of characters - ambiguous queries in information retrieval can be solved via terminology disambiguation.
14:17:24 <tadej> ... A semi-automatic term concept suggestion could solve this, a human can quickly scan and link documents to existing terminology databases. 
14:17:49 <tadej> ... One of the sticking points are the ontologies: how to name certain subject fields? 
14:18:19 <tadej> ... You can get a long ways with the current state of the terminology resources.
14:19:21 <tadej> fsasaki: Are there any things that customers look for? Any workflows that need to be supported?
14:21:22 <tadej> AlexanderLik:  We don't operate our solutions on the web, but internally. If the new standard supports DITA-based XML, it's easy for us to adapt. A new format would be prohibitive for us. 
14:22:33 <tadej> Pedro: We have clients with public and private areas of information. If you can't keep personal information in the sources, you also can't keep these things in translation memories.
14:22:48 <tadej> ... These things could be better controlled by using a tag set for that .
14:23:25 <tadej> Thierry: Are there any other aspects about security in Linked Data?
14:24:40 <tadej> Pedro: If the clients need special security measures, the server needs to be located at the client premises, so that is also an operational issue for us. 
14:25:16 <tadej> paulb: There are several people at DERI working on access control on linked data. There is demand for thins functionality from commercial entities. 
14:25:48 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
14:25:50 <tadej> Arle: Pedro's concern is legitimate: not all Linked Data is Linked Open Data. This is an important aspect for many users. 
14:26:49 <tadej> PhilA: LD != LOD. W3C is aware of that, last week a paper at the Data Forum, a paper presented an approach that had an ACL flag for every triplet .
14:27:00 <tadej> ... a lot of solutions are coming.
14:27:27 <tadej> fsasaki: The LOD working groups are also gathering requirements on that aspect. 
14:27:47 <fsasaki> s/LOD/LOD platform WG/
14:27:48 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
14:28:00 <fsasaki> see http://www.w3.org/2012/ldp/charter for more info
14:29:02 <tadej> ??: When converting XML data structured into more redundant RDF data structures, clients need argumentation to take that approach..
14:29:42 <tadej> Thierry: What is the benefit of LOD as opposed to plain XML? Demonstrating business value is important. 
14:30:36 <Micha> Micha has joined #mlwDub
14:30:37 <tadej> DesOates: In Adobe, we already have a "Linked Closed Data" corpus of terms. It's expensive to maintain and manage. Our problem is not the openness, but actual applicabilty. 
14:31:03 <tadej> ... Developers will never look up termbases for UX or documentation. The Linked part is irrelevant if we can't get it into the software source.
14:31:46 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html tadej
14:48:34 <jimbo> jimbo has joined #mlwDub
15:06:42 <Micha_> Micha_ has joined #mlwdub
15:07:50 <dF> dF has joined #mlwDub
15:07:53 <nico> nico has joined #mlwDub
15:08:29 <dgroves> Kimmo: bridge building is often talked about but not very easy to do
15:08:37 <fsasaki> scribe: dgroves
15:08:54 <fsasaki> topic: presentation from georg rehm - META-NET and the strategic research agenda + LOD
15:08:56 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
15:09:28 <omstefanov> omstefanov has joined #mlwdub
15:10:00 <dgroves> Kimmo: we are often overwhelmed by the multitude of information. META-NET (in particular META-SHARE) is a European initiative to gather related language tools, resources under one access point
15:10:43 <dgroves> META-NET project site: http://www.meta-net.eu/
15:11:08 <dgroves> META-NET Strategic Research Agenda and LInked Open Data presented by Georg Rehm
15:11:44 <mlwDub> mlwDub has joined #mlwDub
15:11:46 <dgroves> Georg: conference June 20/21st in Brussels META-Forum
15:13:09 <dgroves> Georg: what METANET are trying to do is trying to provide each langauge community with resources in order to overcome language borders, the last remaining border
15:13:51 <dgroves> Georg: focused on under-resourced languages of Europe
15:15:13 <Yves_> s/langauge/language/
15:15:30 <Rob> Rob has joined #mlwDub
15:15:32 <dgroves> Georg: 3 lines of action: META-VISION, META-SHARE, META-RESEARCH
15:16:52 <dgroves> Georg: ELRA/ELDA are part of the consortium and have already committed to use the META-SHARE approach
15:17:04 <Pedro> Pedro has joined #mlwDub
15:17:10 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
15:18:19 <dgroves> Georg: META-NET grew from T4ME, CESAR, METANET4U, META-NORD projects
15:19:39 <dgroves> Georg: Strategic Research Agenda - document to mobilise researchers, users and providers of LT for collaboration & community building
15:20:43 <dgroves> Georg: require appropriate programme, appropriate actors and the appropriate support for an effective SRA
15:21:33 <dgroves> Georg: 3 vision groups: Translation and Localistion; Media and Information Services; Interactive Systems
15:21:46 <paulb> paulb has joined #mlwdub
15:22:55 <dgroves> Georg: produced 30 whitepapers discussing language resources and identifying gaps in terms of resources, funding etc.
15:24:27 <dgroves> Georg: Research Priority Themes are "Translation Cloud", "Social Intelligence" and "Socially Aware Interactive Assistants"
15:25:40 <dgroves> Georg: Translation Cloud is not restricted to Machine Translation, includes human translation/translators, LSPs...
15:26:39 <dgroves> Georg: Social Intelligence is concerned with multilingual text mining, discussion platforms at a European level...
15:27:18 <dgroves> Georg: Socially Aware Interactive Assistants = "super-Siri", focus on Speech Technology
15:27:34 <fsasaki_> fsasaki_ has joined #mlwDub
15:27:43 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki_
15:29:41 <dgroves> Georg: Data Challenge will be one of the challenges for Horizon 2020. It is concerned the with data-value chain, open data, big data, linked data
15:31:53 <dF> dF has joined #mlwDub
15:32:05 <dgroves> Georg: Under the translation cloud, access to data (such as linked data) can be used to help improve language technologies including machine translation
15:33:41 <dgroves> Georg: Social intelligence can help provide novel methods to access data, to clean data sets...
15:34:07 <omstefanov> Where 
15:35:08 <dgroves> Georg: We need to consider data security issues and how this effects data distribution/sharing
15:35:33 <omstefanov> ... can I find out more about the Socially Aware internactive applications? Specifically speech recognition? Can't find anything useful on either http://www.meta-net.eu/forum/the-strategic-research-agenda-for-multilingual-europe/ or http://www.lt-world.org/
15:36:08 <dgroves> Georg: we need to publish various langauge resources (terminology, TMs, wordnets etc) as linked (open) data
15:37:37 <dgroves> Georg: META is an open strategic alliance with 600+ members with the goal of supporting the Strategic Research Agenda: http://www.meta-net.eu/join
15:37:39 <fsasaki_> omstefanov, more info is at http://www.meta-net.eu/sra - soon you'll have the SRA linked with more info about that topic and others from ehre
15:37:50 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki_
15:39:36 <dgroves> Kimmo: Horizon 2020 is still only a proposal, nothing is set in stone. The topics may change and the Data Challenge is not guaranteed to be a challenge and additionally there is no guarantee that language will be included if it is a challenge.
15:40:07 <dgroves> DLewis: What is the timeline for these decisions to be made?
15:40:33 <dgroves> Kimmo: By the end of next year. It's a political process
15:40:33 <labra> labra has joined #mlwdub
15:41:21 <dgroves> NickCampbell: Is the Latvian person really going to telephone someone in Portugal?
15:41:37 <dgroves> Georg: It is a vision - looking beyone what is currently possible
15:42:48 <dgroves> RSchaler: Siri isn't great, but it works. What is the real revolutionary output of what you're presenting?
15:43:55 <dgroves> Georg: The capabilities are very limited on the iPhone. However, you can't really engage in a dialogue with the system. We want to extend it to something that is following you across devices and has a profile of your preferences etc.
15:44:03 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki_
15:45:24 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki_
15:45:39 <RyanHeart> RyanHeart has joined #mlwDub
15:45:43 <dgroves> Q: We have  a limited number of languages. The EC/META-NET could invest in making the speech tools support some of the lesser-supported languages or help finance projects that would make speech tech. broaden in scope for more languages and more accents
15:46:44 <dgroves> Georg: Many of these tech. research gaps are identified in the language white papers.
15:46:48 <RyanHeart> So far, the language industry functions based on markets?
15:48:14 <dgroves> PhilArcher: in w3c office in New Dehli it came across that people prefered to "speak" to the web - based on the proposed work you can really transform countries
15:49:14 <RyanHeart> Would companies have invented a keyboard if users could have talked to devices?
15:49:16 <dgroves> DLewis: We have to always be aware of the question re. what is truly innovative/novel about the work.
15:50:38 <RyanHeart> Is 'vision' about extending current technologies to other languages, platforms, environments?
15:50:55 <dgroves> Q: Clearly adding current capabilities to new languages is a huge issue. Regarding the current capabilities of the technology, are there any theoretical breakthroughs that are needed in order to accomplish the vision for 2020 or is just a matter of applying existing technologies better?
15:52:08 <dgroves> Georg: Of course there are many breakthroughs needed. One of the main components of the SRA is a roadmap document that builds timelines of what needs to be done when, with priorities identifying what problems need to be solved along the way
15:52:36 <dgroves> Georg: we have to be realistic and credible at the same time
15:53:12 <dgroves> DesOates: the key is to join the dots between the technologies. The round-tripping between the technologies still needs to be done
15:54:40 <dgroves> We need to ensure to include the link between language resources and language technologies in the SRA
15:55:13 <dgroves> Kimmo: You can have a critical discussion about this at METAFORUM
15:55:48 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
15:56:58 <dgroves> Kimmo: We should not let the fact that Google/Apple have already developed the technologies to discourage further research/development - trying them out it is clear that the problems (e.g. interactive assistant, subtitling) are not completely solved.
15:57:53 <RyanHeart> The point is not to say: Apple or Google have solved the problem (they might or might not have), but whether we can manage to come up with revolutionary new ideas for technology solutions, rather than working on extending existing ones.
15:58:06 <dgroves> Kimmo: session concluded
15:59:14 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
15:59:15 <mhellwig> scribe: mhellwig
15:59:35 <fsasaki> topic: action plan discussion led by dave lewis
15:59:37 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
16:00:17 <mhellwig> DaveL: What changes of thinking and planning to we need to advance multilingualWeb
16:01:13 <mhellwig> DaveL: we have two communities (language resources and language technologies) and there are synergies between them - they share confereces and projects
16:02:00 <mhellwig> ... a lot of the activity relies on research which causes people to worry about sustainability
16:03:05 <mhellwig> ... on the other side we have the private sector, which may be side driving activities
16:04:16 <mhellwig> ... besides multilingual or multinational organisations, localisation industry which has a long tail of companies
16:04:42 <mhellwig> ... MultilingualWeb-LT is interesting, because it brings the communities together
16:07:17 <mhellwig> ... the question is: how do we address the requirements of the communities?
16:07:55 <mhellwig> ... and how do gather information from all the communities
16:09:48 <mhellwig> Q: Where is the data to be linked? Maybe we need an inventory of the most important data that need to be linked. Who is caring enough to pay for it?
16:11:23 <mhellwig> ??: If you talk about sustainability, what is the motivation for people to host the LOD. All participants in the LOD cloud have different incentives and that's why it works. 
16:12:22 <mhellwig> ... the publisher provides the data and then everybody can take it, use it. As long as the incentive is there LOD will continue
16:13:37 <mhellwig> dF: so far no real use case has emerged. LSP serve the publisher, but the publisher doesn't need the data link. Another problem: "here is data, now link it" is the wrong approach. It is not sustainable to link things that were created
16:14:49 <mhellwig> ??: the motivations for LOD are very diverse. one motivation to open databases could be that there are many silos in information technologies that do not integrate. So a motivation is to integrate and create interoperability. 
16:15:24 <mhellwig> ... people want to open data, data that is multilingual
16:16:28 <mhellwig> fsasaki: luxembourg workshop: what are the important parts of the infrastructure work to make the vision presented work? Best practices would help. 
16:16:48 <mhellwig> ... a project that creates best practices and uses them will be helpful
16:18:15 <mhellwig> SebastianSklarss: food agricultural organisations - started in 1980 - converted their data to make it linkable. They provide the infrastructure for linking up; they feel the have the mandate to host the data. From there you can start linking up
16:19:24 <mhellwig> ??: datawarehousing exists separate from the language industry; the data is structured to extract meaning from the information.
16:19:59 <mhellwig> ... (referring to LOD cloud) it looks like a swarm. on breakthrough to find is 
16:21:40 <mhellwig> ??: Open Linked Data WG was discussed taht it's not really applicable to linguistics. Persons have thhe same level of granularity, but this is not the case here. Secondly, privacy issues are problematic, linguistic data is natural speech
16:21:47 <Micha_> Micha_ has left #mlwdub
16:24:38 <mhellwig> Pedro: internet has three problems: how to retrieve and maintain amounts of data. Second, how do you make it accessible. Third, data maintenance and retrieval of data should be cheaper. 
16:25:00 <mhellwig> ... We have Web 2.0, now 3.0. We cannot do 4.0 without LD
16:26:04 <mhellwig> ??: Emergency cases can use LD across languages, for example a building is on fire and the cleaning lady doesn't speak English
16:26:38 <mhellwig> Arle: What do we need from Localisation service providers. We need the involvement of various communities, we want to bring them together.
16:26:59 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
16:28:06 <mhellwig> ??: it's not the localisation service providers / tool creators fault if prices go. They have to be aware that the industries are not willing to pay more for fragmentation changes
16:29:40 <mhellwig> DaveL: What is the motivation is a good question. We can publish stuff, but making the links it the important point. Maintaning links could be expensive. We have to make LD useful. Data warehouses are a good example, because they found a way of making it profitable.
16:30:43 <mhellwig> dF: data hygiene is expensive. Back to the LSP: they serve publishers but there is a disconnect between the two. 
16:32:31 <mhellwig> fsasaki: data needs to be transformed. bring it an xml format, then to an HMTL format. You run into issues. The key issue of LOD in the diagram (DaveL presentation) is it only an interoperability layer or is it a layer to organise the workflows
16:34:17 <mhellwig> dF: use cases are very important. The incentives are there, they are signs of sustainability. Often there is no consensus in the industries or no implementation commitment. Emphasis should be on real use cases. 
16:35:24 <mhellwig> ??: two motivations for LOD movement: government accountability and responsibility. And for that you need links.
16:36:00 <mhellwig> ... Public sector uses LOD. There is the UK crime map, on a heat map can see crime. The police are the biggest users; they save money by accessing their own data. 
16:36:33 <mhellwig> ... publishing data is expensive, but the cheaper than the alternative
16:37:56 <mhellwig> Thierry: you need to publish the document in the public sector, but they aren't compatible nor comparable. Linking data is best. 
16:38:34 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
16:38:50 <mhellwig> ... It's cheaper to have only one terminology database where others can link to. In the medical domain they are pushing that. We should talk to those communities to see how it's working for them
16:39:43 <mhellwig> SebastianSklarss: micropayment is missing for LOD. If it is made for machines, we need a mechanisms to make machines pay for it.
16:40:40 <mhellwig> dF: micropayment is feasible.
16:41:25 <mhellwig> ... LOD should be paid for by the government. It's a kind of utility and payed for by taxes. We need to influence the policy makers. If they don't see it, then we must go for market sustainability.
16:42:34 <mhellwig> DaveL: two parts - gain public sector support with very clear use cases for the public sector. On the other hand, adding value (e.g. data warehousing) for the private sector
16:43:26 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
16:43:51 <mhellwig> Pedro: amount of LOD is so huge, that you cannot control it by one body or organisation
16:44:29 <mhellwig> DaveL: What we need to look for is use cases.
16:44:59 <mhellwig> ... Are there people in the room interested at looking at these issues?
16:45:59 <mhellwig> fsasaki: would like to add another use case. Processes that data goes through. 
16:46:42 <mhellwig> ... define provenance and the state of the translation. It's already done, but expensive. Maybe we can make it cheaper
16:47:45 <fsasaki> fsasaki: see also the linked open data working group - co-chaired by IBM with one motivation: making software development processes easier to organize, more sustainable etc.
16:49:17 <mhellwig> Nicoletta: small use case: can we link what we have in meta share with the lingual data. Else we have the risk of running different directions.
16:50:16 <mhellwig> ... One of the problems linking the cloud, there is a double meaning of "data". There's data and language resources we use to work on the other data. 
16:51:03 <mhellwig> DaveL: [ends session; thanks participants]
16:51:45 <mhellwig> DaveL: Let's talk tonight and in the next couple of days in more detail. We should use the opportunity to move things forward. The use case focus is really important.
16:52:48 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
16:53:05 <mhellwig> Kimmo: closing remarks
16:53:51 <mhellwig> ... important comments on government involvement. We do not need to standardise more. I say no first to test you.
16:54:18 <mhellwig> ... we need to know: What do you want us to standardise? What has consent?
16:54:58 <mhellwig> ... LOD is a buzz word that is used in a rather sloppy way. Because we have obsessed over LOD, but it's true that LD is equally linked. 
16:55:37 <mhellwig> ... Business models - e.g. micropayment - is an important point. We need payment mechanisms, small amounts per transaction
16:55:49 <mhellwig> ... a sort of application store in a data environment
16:57:23 <mhellwig> ... "What data should be linked and why" is a good question and we should test ourselves. If we don't know maybe then it's not a good idea
16:57:56 <mhellwig> ... The importance of the event is discussions. And the documentation will be available (scribes, presentations, ...). 
16:58:11 <mhellwig> ... We have linked the constituents, but may take several years before things get moving
16:58:50 <mhellwig> ... in the future we'd like to co-locate this event
16:59:33 <mhellwig> DaveL: [closes today's event; thanks participants]
16:59:54 <mhellwig> DaveL: Videos, presentations, transcripts will be available
17:00:03 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
17:00:08 <tingley> tingley has joined #mlwDub
17:01:11 <fsasaki> cfp for localization world event is here http://www.localizationworld.com/lwseattle2012/feisgiltt/
17:01:26 <mhellwig> DaveL: [thanks to organisers of event]
17:02:38 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki
17:02:43 <fsasaki> meeting adjourned
17:02:45 <RRSAgent> I have made the request to generate http://www.w3.org/2012/06/11-mlwDub-minutes.html fsasaki