MLW-LT WG -- 26 Nov 2013

LIDER presentation from asun

<fsasaki> slides are at https://lists.w3.org/Archives/Member/member-multilingualweb-lt/2013Nov/att-0000/LIDER_and_MWeb-madrid.2013-11-26__asun_.pptx

<fsasaki> serge: an example of linked open data?

<fsasaki> asun: geographic data, library data, ... all converted to linked open data, available as RDF

<fsasaki> ... companies have migrated data to rdf. dbpedia, freebase, datasets related with music, ...

<fsasaki> asun: let's suppose we have different data in libraries

<fsasaki> [example of variuos library sources]

<fsasaki> asun: a user wants to make research about a famour author

<fsasaki> .. normally she has to look into the separate library catalogues seperately

<fsasaki> .. lihked data is about bringing the data sources together

<fsasaki> .. since the data sources are in teh same format (RDF) they easily can be connected

<fsasaki> .. e.g. miguel de v. has just one identifier in all data sets, independent of the langugage

<fsasaki> .. more and more people start to publish data in RDF - visualization of linked open data cloud

<fsasaki> [example of library data]

<fsasaki> asun: everything in linked data is in RDF

<fsasaki> .. RDF is a W3C standard to represent data on the Web

<fsasaki> .. data model is very simple: there is a subject , a property and a value

<fsasaki> .. example: a person (sub) is the author (property) of the painting (value)

<fsasaki> .. now another triple added to the existing ones

<fsasaki> .. the data is normally not built manually, e.g. by transforming existing data to RDF

<fsasaki> .. important part is: we have two levels

<fsasaki> .. content and data level

<fsasaki> .. on content level: person is writing a book, etc.

<fsasaki> .. for each item on the content level we provide a URI: e.g. URI for a person, the book, the property "author" etc.

<fsasaki> .. e.g. URI from spanish national library of author "Miguel de Cervantes"

<fsasaki> .. URI is provided by the authority in the field: here the spanish national library

<fsasaki> .. you can use your own vocabulary to create the data, or use existing ones

<fsasaki> .. the model is in RDFS: it defines what URIs you can use in the content.

<fsasaki> .. in order to use interlink data, you use the sameAs property. You write a triple like: Cervantes(URI of spanish library) is sameAs Cervantes(URI of a world wide library)

<fsasaki> asun: there are tools for discovering these kind of links

<fsasaki> .. when I have the URIs, I can use them for browsing the data

<fsasaki> [example of Cervantes URI browsed in spanish dbpedia = RDF data source generating from wikipedia]

<fsasaki> serge: what tool are you working with?

<fsasaki> asun: various tools: generating RDF, creating links, visualization of data, ...

<fsasaki> .. the tool we have for browsing is here http://dbpedia.org/page/Miguel_de_Cervantes

<fsasaki> [above link is from English dbpedia]

<fsasaki> asun: key points about linked data are: we need models, data to be transformed to RDF, and data to be stored on the Web to query it

<fsasaki> serge: the tool that you click on - is that public?

<fsasaki> asun: sure

<fsasaki> [visualization of Cervantes links, taken from dbpedia data source]

<fsasaki> asun: if we include all the multilingual information into RDF, somebody can use the data in her own language

<fsasaki> .. currently there is not a lot of information in linked data in different languages

<fsasaki> serge: the ontology will be different depending on the language

<fsasaki> .. e.g. Cervantes will be detailed in Spanish, but other items will be more detailed in other languages

<fsasaki> asun: we have done work in this area, see thesis from Elena - and the lemon lexicon model we can deal with that

<fsasaki> pedro: it is not critical for them, they can build applications with these differences

<fsasaki> asun: example of different taxonomies of fishes, e.g. more details in Japan

<fsasaki> .. the point is: if we have a model e.g. in the library domain, the socalled IFLA model

<fsasaki> .. it is not difficult to translate the model to other languages

<fsasaki> serge: true - the object "certain fish" may have different names

<fsasaki> .. but what to do with expressions of snow

<fsasaki> .. some area of culture are bigger in one language than in others

<fsasaki> asun: sure, but we can map the differences

<fsasaki> .. with different approaches

<fsasaki> elena: we have investigated how to deal with cultural discripencaies

<fsasaki> pedro: this is a mechanism to relate information

<fsasaki> serge: ontologies are different in different languages

<fsasaki> asun: sure - but some parts can be the same, and there are different ways to interlink between languages

<fsasaki> asun: so we are aware of the problem - but there are ontologies that have been built that show consensus

<fsasaki> .. there are different approaches

<fsasaki> .. the topic of localization and cultural aspects are important

<fsasaki> ... in LIDER we want to see: how can the data created as linked data can be used for natural language processing

<fsasaki> [LIDER presentation continued, slide 5]

<fsasaki> arle: will make a different visualization of the "LOD is dominated by English" slide

<fsasaki> "Linguisic Linked Open Data Cloud": language resources converted to RDF

<fsasaki> asun: linguistic linked open data can help to see to improve natural language processing services

<fsasaki> .. and the NLP services can help to improve / evaluate what we have as LOD

<fsasaki> asun: project is about gathering a community

<fsasaki> .. question is: what extensions do we need to the linked data cloud for NLP / content analytics

<fsasaki> .. we want to create not general LOD resources, but 3LD: Linguistic Linked Licensed Data

<fsasaki> .. linguistic LOD is a subset

<fsasaki> .. important: this is not about open data

<fsasaki> .. this is about linguistic data. Some can be open, some can be closed

<fsasaki> .. there is also a group in W3C working on how to express license information in RDF

<fsasaki> .. some instituations can get new business models by having their data in that way avail.

<fsasaki> dave: a lot of linked data has no license information at all

<fsasaki> asun: important point

<fsasaki> .. the license aspect in linked data is real chaos

<fsasaki> dave: people publish data without license detail

<fsasaki> .. so commercial companies that want to use the data, you don't know if you can use it

<fsasaki> asun: RDF is a technology to do something

<fsasaki> .. nothing more

<fsasaki> .. the difference to e.g. an oracle data base is: RDF is a standard

<fsasaki> .. but what you do with it with regards to licenses depends on you

<fsasaki> pedro: if you present this to globalization / localization world

<fsasaki> .. they won't see themselves in the picture

<fsasaki> .. the gap between language technology and localization is huge

<fsasaki> asun: that is an important point

<fsasaki> .. we decided that we will focus not only on NLP, but also translation, to attract also the localization community

<fsasaki> .. LIDER has three goals: community building, industry use cases, and a roadmap / reference architecture

<fsasaki> asun: we want to explore: which task would require linked data

<fsasaki> .. and see: which type of service could be provided

<fsasaki> slides are now at http://www.w3.org/International/multilingualweb/lt/wiki/File:LIDER_and_MWeb-madrid.2013-11-26_asun.pptx

<fsasaki> serge: this is a big topic

<fsasaki> .. the "gala audience" (language service providers") are just about generating money - they don't have time to think about big things

<fsasaki> .. so what you presented is great, but how do we present it to the LSPs

<fsasaki> davidF: there is a big divide beween "text community" and "data community"

<fsasaki> .. we need to bring these together

<fsasaki> serge: these technologies may generate new business opportunities for the industry

<fsasaki> [presentation continued, slide 12]

<fsasaki> asun: need to gather feedback from data community and language community, via roadmapping workshops

<fsasaki> asun: communities to approach: also libraries, dublin core (but very general vocabulary, ...) etc.

<fsasaki> olaf-michael: we saw in the MLW-LT group that it helps to have representatives from the various communities

<fsasaki> .. part of the outreach it may make sense to mention the communities that may be interested in contributing

<fsasaki> asun: the roadmapping ws for the 2nd year are open, topic wise

<fsasaki> .. these roadmapping ws are not conferences, but with the focus of going into depth in a topic

<fsasaki> richard: need to lialise with w3c WG, e.g. data area

<fsasaki> .. bring phil archer in

<fsasaki> dave: issue with open data is: how to make money out of that

<fsasaki> .. the localization industry can help to find such use cases

<fsasaki> discussion on what is needed to make the technology discussed in LIDER avail. for the localization industry

<fsasaki> asun: the linked data value chain is not analyzed yet

<fsasaki> .. the chain will be different depending on the use case and the community involved

FALCON project

<fsasaki> .. TCD, DCU, XTM, Interverbum and EasyLing are partners

<fsasaki> .. started in October. Learned from industry partners: don't explain linked open data, but rather start with the benefits

<fsasaki> scribe: dvilasuero

<omstefanov> is there a link to dave's presentation?

<fsasaki> omstefanov, not yet, will provide that later

<omstefanov> :-(

dave: FALCON brings together expertise from academia and vendors within the field of localization, translation and Linked Data, working mainly towards interoperability

creating an Open Schema and a SaaS platform for language service providers, with special attention to workflows

dave: W3C provenance vocabularies as a importannt

mechanism for localization processes and workflows

<fsasaki> an older FALCON presentation: http://www.lt-innovate.eu/system/files/attachments/3.%20falcon-%20Dave%20Lewis.pdf

<fsasaki> I'll upload the one presented today later

??: european bodies want to have their own linguistic data and not depend of external companies

<fsasaki> project web site at http://www.falcon-project.eu

that is the reason why EU is funding this kind of projects (e.g., FALCON, Lider)

asun: Linguistic Linked Data as a replacement for machine translation memories, how do you foresee this situation?

Dave: Integration LOD with existing tools and process rather than substituting them

asun: Dynamics and synchronization are key for translation memories and processes and this is still an open issue for many Linked Data systems

Dave: Provide the producers with the mechanisms for publishing and sharing their resources

pedro: The LD paradigm is a shift from strings to concepts and that should be taken into account
... we should talk about LD as another resource not a substitution of for example translation memories

dave: FALCON is driven more from direct vendor technological needs, trying to solve some of their specific use case in localization processes.
... Some technology will be open source, other will be kept by vendors and integrated within their systems
... there are a lot of different components that need to be built to leverage LD: URI/L persistence, scalability, etc.

olaf-michael: TAUS, initiative about sharing data, apis, etc.

<fsasaki> speaker is olaf-michael

<fsasaki> http://www.w3.org/International/multilingualweb/lt/wiki/Madrid_November_2013_f2f#Tuesday_26th

MLW workshop planning

fsasaki: the key behind the success of MLW is its broad and general focus, integrating participants from different sectors and backgrounds (LOC world, LTS, and now Data community)

<fsasaki> http://www.multilingualweb.eu/documents/rome-workshop/rome-program

fsasaki: For last event in Rome, the workshop was organized into different thematic sessions
... user, data, translation, etc.
... MLW WS is not an academic workshop, in terms of community and effort needed for organizing it

<fsasaki> http://www.multilingualweb.eu/documents/rome-workshop/rome-cfp

fsasaki: preparation of the WS involves a lot of effort in identifying key players within each topic, sector, etc.
... regular phonecalls to pull key players, rather than selecting just based on academic contributions
... the spirit of the MLW WS will be kept, Linked Data as a dedicated session and also a third day for the Lider roadmapping workshop
... discussion about the dates will take place after lunch

asun: It could be interesting to provide intro of or short tutorial on LD within the MLW WS

??: introduction could happen offline, provided in advance to participants from industry. This could be more productive than a pure educational session

scribe: there's a process of digestion, you need to give vendors and services providers time to analyze the impact of new technologies

fsasaki: articulate the story behind multilingualweb brand and its evolution.

WS dates

<fsasaki> http://www.multilingual.com/events.php?pageNum_rsEvents=1

<omstefanov> Felix: JIAMCATT 2013 is 23-25 April 2014 at the Council of Europe (CoE), Strasbourg, France

<fsasaki> first check of current upm campus is avail.; if not then check the camino; if not go to upm remote campus

<fsasaki> proposal: WS day 1: 1 april

<fsasaki> WS day 2: 2 april

<fsasaki> afternoon of day 2: roadmapping WS as dedicated event

<fsasaki> dates: 13-14 may, fixed

<fsasaki> ACTION: felix to talk to jaap about open space session [recorded in http://www.w3.org/2013/11/26-mlw-lt-minutes.html#action01]

<trackbot> Created ACTION-585 - Talk to jaap about open space session [on Felix Sasaki - due 2013-12-03].

<fsasaki> 6-7, or 7-8 may new options

<fsasaki> http://www.multilingualweb.eu/

<fsasaki> ACTION: felix to bring mlw ws through w3c mgmt within the next 2 weeks [recorded in http://www.w3.org/2013/11/26-mlw-lt-minutes.html#action02]

<trackbot> Created ACTION-586 - Bring mlw ws through w3c mgmt within the next 2 weeks [on Felix Sasaki - due 2013-12-03].

PC

<fsasaki> approach encarna lous about pc

<fsasaki> invite original PC people back

<fsasaki> felix will write mail to all previous PC members and the LIDER people. Confirmation of PC will be done on Monday

<omstefanov> what about those who raised their hands now?

timing of cfp

<fsasaki> publish cfp before christmas

<fsasaki> http://www.multilingualweb.eu/documents/rome-workshop/rome-cfp

<fsasaki> contact multilingualmag about calendar

<fsasaki> need to have keynote speakers

<fsasaki> https://lists.w3.org/Archives/Team/multilingualweb-pc/

<fsasaki> https://www.w3.org/International/multilingualweb/wiki/WS6

MLW branding

<fsasaki> richard explains MLW logo reasoning

<fsasaki> richard: lider logo like in http://www.w3.org/International/multilingualweb/lt/wiki/File:LIDER_and_MWeb-madrid.2013-11-26_asun.pptx presentation is fine, with MLW included

MLW web page

<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Nov/0054.html

<fsasaki> http://www.multilingualweb.eu/

<fsasaki> "home" "workshops" "projects" tab. Under "projects" a pull down that links to the projects. LIDER on the tab, then MLW-LT, then original MLW

<fsasaki> "blog" section would go away

<fsasaki> "about" and "links" would be part of the project specific tabs

<fsasaki> idea is to reorganize the content within the next weeks to have the dedicated LIDER section

<fsasaki> asun: how to announce LIDER event?

<fsasaki> send a mail to richard and felix

<fsasaki> daniel: aggregate on MLW portal #lider-eu

<fsasaki> richard: we will look into that, depends on ERCIM based setup of the portal

\me thank you richard and felix

L3D survey

<r12a> np dvilasuero

<fsasaki> https://docs.google.com/document/d/1D3sEfmS746CWHZ1U3rep8KwHpgTo00qY6jGkDDjPrQg/edit

<fsasaki> asun: add data patterns on how to create and deploy linked data

<fsasaki> oxygen example at http://www.youtube.com/watch?v=F6zIW6blF5k

<fsasaki> going through the survey

<fsasaki> tadej: information retrieval & search are missing

<fsasaki> .. also recommender systems

<fsasaki> .. not strictly nlp, but falls into the bucket

<fsasaki> .. some things are individual NLP tasks, some are like features for a system

<fsasaki> dave: good points, will also pass that around in cngl

<fsasaki> tadej: you should not go from technology but from features, e.g. "this feature can search for synonyms"

<fsasaki> .. include also sentiment analysis, content classification

arle about qtlp and qt21

<fsasaki> http://www.qt21.eu/

<fsasaki> arle: media is important too - focus on tv subtiteling in various languages

<fsasaki> elena: for a human translator the audience is important when applying metrics

<fsasaki> arle: you can specify that

<fsasaki> http://mlwlt.moravia.com/mlwlt-web-test/Presentation.aspx

h2020 topics

<fsasaki> open source tools in more ergonomic fashion

<fsasaki> based on open web platform

<fsasaki> funding open source tools - web service for readiness and beyond

<fsasaki> basic corpora to develop open source tools in many languages

<fsasaki> provide a model to work better taking also legal issues into account

<fsasaki> its2 in a box - end2end implementation

<fsasaki> drupal with preloaded plugins, like "moses for human beings"

<fsasaki> having metadata generated automatically (not only text analysis, also quality checking etc.)

<fsasaki> extend its2 not to cover only markup but other applications, as a general annotation mechanism

<tadej> http://code.google.com/p/moses-for-mere-mortals/

<fsasaki> support for localization in javascript

<fsasaki> evangelize - best practice documentation on how to use its2 in practice, how to convince the content providers to use the technology

<fsasaki> bring its2 closer to semantic web - semantic MT / semantic web search

<fsasaki> have more automatic annotaiton, so that we have data

<fsasaki> its2 - evangelization, adoption; how to direct h2020 calls with input from LIDER

<fsasaki> bring semantic technologies to certain task - cross lingual data access. develop the critical mass of linked data + core technologies (cross lingual core tec tasks)

<fsasaki> bringing multilingual linked open data cloud together with MT community and localization

<fsasaki> briding localization and linked data and discover relevant implementations

<fsasaki> make its2 the default

<fsasaki> work in integration linport and its2

<fsasaki> have xliff 2.0 finished

<fsasaki> to deliver xliff to developers - its2 is a start on the metadata path

<fsasaki> a great fuzzy matcher for TM, taking ontologies and other opportunities into account

<fsasaki> build a platform or a set of services to help users to use the resources

<fsasaki> discover resources automatically

<fsasaki> make annotation process more ergonomic and integrate user into a feedback loop in annotations

<fsasaki> what changes if you look at documents (not web content) with other structures and tools

<fsasaki> making it easier to annotate / create links, improve quality of links in a social process

<fsasaki> dealing with multimedia content (e.g. audio tracks)

<fsasaki> dealing with other types of content (javascript, media formats) for annotating content

<fsasaki> work with personalized content

<fsasaki> xliff 2.0 finish; related: open source cat tools

<fsasaki> more support for user interfaces

<fsasaki> tools for injecting various types of standardized metadata

<fsasaki> reference set of javascript etc. libraries to do CAT tool processing

<fsasaki> leverage "open web platform" technologies

<fsasaki> demo of jquery library http://www.w3.org/International/its/ig/demos/its2-in-the-browser.html

<fsasaki> (library developed by cocomore)

<fsasaki> stop defining XML file formats, but rather object models

<fsasaki> have more MLW workshops, also outside Europe

<fsasaki> publishing a "beginners guide on ITS"

<fsasaki> add back directionality / ruby into ITS

<fsasaki> looking at data formats for names / addresses / ...

<fsasaki> gather international typographic requirements, and implement the things in tools (browsers, editing tools, ...)

<fsasaki> review core w3c specs for i18n / l10n issues

<fsasaki> tool for automatic / semiautomatic pre-editing / annotation

<fsasaki> not only cat tool but computer assistant "knowledge tool", including knowledge exploitation

<fsasaki> disruptive technology - MT machine learning and other methods

<fsasaki> allowing the data experts to work with MT and other applications, bringing in their data and knowledge

<fsasaki> [discussion on open source cat tools market share]

<fsasaki> automatic annotation / UI to make the annotation easier, the rest will follow?

<fsasaki> adjourned

- DRAFT -

MLW-LT WG

26 Nov 2013

Attendees

Contents