07:51:36 RRSAgent has joined #mlwrome 07:51:36 logging to http://www.w3.org/2013/03/13-mlwrome-irc 07:51:50 meeting: mlw workshop rome, day 2 07:51:55 philr has joined #mlwrome 07:52:03 scribe: DomJones 07:52:14 rrsagent, make log public 07:52:30 topic: multilingual linked open data patterns 07:52:39 rrsagent, draft minutes 07:52:39 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html DomJones 07:52:47 chair: arle 07:53:23 agenda: http://www.multilingualweb.eu/en/documents/rome-workshop/rome-program 07:53:31 waiting for the meeting to start ... 07:53:47 present+ many, many, many, people 07:53:51 rrsagent, draft minutes 07:53:51 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html DomJones 07:57:16 Arle has joined #mlwrome 07:58:07 Jose: best practices for MLOD given at last workshop. One pattern is a solution to a problem. Good to have catalog of patterns for selection. Common vocabs. 07:58:11 daveL has joined #mlwrome 07:58:14 fsasaki has joined #mlwrome 07:58:19 ... best solutions for Multlingual Linked open data 07:58:38 ... each pattern has description, example, discussion. 07:59:28 ... Patterns have name, dereference, long desc, linkings and refuse factors. 07:59:46 www.weso.es/MLODPPatterns 07:59:51 ... 20 patterns, for community to add to and adapt 08:00:36 ... person is an armenian and professor at uni of leon. person has birthplace, postion and worksat 08:01:23 ... 1st select a uri scheme. URI is human readable ASCII characters 08:02:04 tadej has joined #mlwrome 08:02:15 ... another pattern is opaque URIs where local names are not human readable. These are indepednant from natural language implementation 08:02:32 ... These are hard to handle by developers 08:02:58 ... So descriptive URIs, Opaque URIs and Full IRIS 08:03:40 ... internationalised local names. Domain name is ASCII chars but local name is in local chars 08:04:04 correction: http://www.weso.es/app/webroot/MLODPatterns/ 08:04:14 ... another pattern is to include language tag in the URI 08:04:17 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:05:23 ... Dereference: return labels based on language code of the user 08:05:33 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:05:43 ... semantic equiv of data needs to be identified. 08:06:31 ... Labelling - label everything including using multilingual labels. ML labels have a problem when querying looks for mono-linugal labels. 08:06:46 ... solutions = labels with no lang tag 08:07:37 ... with this which language is the default? Longer descriptions are difficult to handle, better to have finer grained descriptions to seperate out labels. 08:08:26 ... for longer descriptions there is the possibility of structured litterals. 08:08:47 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:09:48 ... linking same concepts in different languages which are identified as being the same. However contradictions exist. Link linguistic meta-data exists, 1st class lang annotations. 08:09:50 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:10:25 ... resuse: vocabs are generally mono-lingual. Multlingual vocabs are more difficult to maintain 08:10:53 ... can create new localised vocabs 08:11:25 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:11:41 ... future work - session on best practices for ML LOD, opportunity to improve catalog / add to / remove from catalog 08:12:23 topic: multilingualism in Linked Data 08:12:32 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:12:54 Asuncion: all ML concepts should be addressed in LD generation 08:13:11 ... model is simple, everything is in rdfs 08:13:55 ... subject, property, value. Unique identifiers, URIs are used. Subjects are represented by URIs. 08:14:02 ... using equiv links to link data sets 08:14:58 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:15:46 ... lots of info sources in different languages. RDF generation and linked data allows for graphical representation of ML LOD sets 08:16:02 ... currently looking at million literals data set 08:16:31 ... numbers of literals with langauge tags has increased from 2011 to 2012 08:17:32 ... still mostly in english. Data in other languages are simular. Most data is in English as not many countries are providing LD in languages other than English 08:18:09 ... in LD cloud ML queries is achieved through 6 stages. 08:19:01 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:19:29 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:20:37 ... 1) specification, how to model data sets. 2) Translate labels of ontology into other languages, align vocabs of other languages. Reuse / align existing vocabs. 3) RDF generation use richer models for applications 08:20:45 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:21:01 tadej has joined #mlwrome 08:21:10 ... 4) link generation - how to discover cross lingual links - how to represent cross-lingual links - how to store and reuse links. 08:22:25 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:23:05 ... concepts are tagged in langauage-based ontology, these ontologies are linked, cross-lingual links. Properties describe medicine 08:24:13 ... ontology in german and spanish, translate german into spanish and check for alignments or use cross-lingual-ontology matching across both 08:24:27 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:24:55 ... 5) publication - links can be discovered at run time of offline, some storage method is needed for links already discovered. 08:25:42 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:25:57 ... 6) Exploitation how to adapt semantic query to linguistic and cultural background of a user. Also how should results of semantic query be adapted. 08:27:49 ... For ML LOD many services need to exist from generation through to consumption - ML LOD should be provided through service translation but now we should start including lang features in the generation of data 08:28:38 Topic: Public Linked Open Data 08:29:26 Peter: Large repository for public linked open data 08:29:48 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:30:32 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:30:42 ... publicatons office of EU is a publisher of EU institutions, legislations and non-legislation documents. Whole process of document management. Finally moving from paper to electronic model and from publisher to data provider 08:31:04 ... shift from paper to elec makes the electronic version of EU journal legally binding. 08:32:02 ... Multilingualism is core, 23 languages used. Every EU member state requires publication in their own language. For example 2600 pages per document * 23 langs 08:32:36 ... ML supports all member states equally therefore ML public websites must exist. For Law, procurement, CORDIS and general publication bookshop. 08:32:53 ... Four systems for the ML semantic web 08:33:10 ... CELLAR, EU Data Portal, Eurovoc, MDR 08:33:56 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:34:58 ... 1) CELLAR in currently production, not yet public, being loaded / populated, some key concepts - repos is defined by common data model (ontology). Semantic model is built up by these components. Loading is standardised, 30TB of data 08:35:25 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:35:55 ... in repos content is stored in top level, meta-data is linked to this. Distribution side and SPARQL end point. 08:36:22 ... 700M triples in the store. Mainly PDF, XML and XHTML. 08:36:39 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:36:59 ... accessible through RESTFul API or through SPARQL endpoint 08:37:47 ivan has joined #mlwrome 08:38:25 ... 2) EU Data Portal. Single point of access to all structured data for linking and reuse of commercial and non-commerical data 08:38:25 hi 08:38:39 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:39:07 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:39:15 ... RDF based interface for upload of meta-data 08:39:45 ... 3) EUROVOC avaliable in SKOS/RDF or XML format. 08:40:45 4) Meta-data Registry (MDR) for concepts which have been validated they are published through CELLAR, Controlled vocabs etc 08:40:58 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:41:44 5) For english all the languages of the EU are presented, translations are discussed between all units in the EU and therefore offical transation (by member states) exist 08:42:10 which presentator is this? i'm watching live and just opened the live stream? 08:42:26 ... European Legislation Identifier (ELI) follows W3C RDF / XML to provide data in standardised way. 08:43:14 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:43:38 topic: Multilingual Issues in the Representation of International Bibliographic Standards for the Semantic Web 08:43:40 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:44:08 Gordon: IFLA body which maintains global standards for library and biblio enviroment. 08:44:52 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:45:04 ... Seperate to IFLA is ISBD and UNIMARC all three relate to library / bibilio standards, 08:45:09 ... all three use internationally. 08:45:56 ... IFLA has own namespace for standards. Supports conversion from library linked data without loss of information. 08:47:13 ...IFLA has 7 languages. Standards generally written in English and then translated into the 7 languages. 08:47:40 ... ML website launched in spanish, partial doc in spanish of what exists in spanish already. 08:47:47 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:48:33 ... Open meta data reg is used to store classes, URIs for each maintainers. These are Opaque as to avoid lang bias when used in RDF 08:48:53 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:49:44 Monica has joined #mlwrome 08:50:07 ... ISBD elements - problems occurred when namespace was translated. Translation into spanish became guidelines for doing future translations. Contains much info on the problems / issues of translations. 08:50:13 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:50:30 nwaltham has joined #mlwrome 08:51:29 ... Problems 1) scope, what is transalted first and what is most useful. (developers - element definitions, labels) (Users - what they see labels of concepts in value vocabs). 08:51:35 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:52:03 ... 2) Style: Verbal phrasing, CamelCasing etc 08:52:51 ... hasAuthor, hasTitle does not translate perfectly into other languages. CamelCasing looks bad in other languages, whats ok in one language may not work in another language. 08:53:03 DomJones: are you a bot? 08:53:16 fsasaki: can i ask questions to speakers from irc? 08:53:59 3) Disambig methods for creating labels may vary between languages. 08:54:05 lmatteis, you can ask the question and I can relay your question in the q/a session - which starts 10:15 08:54:38 ok thanks! 08:55:03 4) Language Inflection 08:56:16 ... Partial translations only preferred label translated, have to track status of translation through a number of stages, schedules and status tracking are required. 08:57:08 ... MulDiCat for authorative translations of IFLA standards, avliable in open meta-data repository as well. More than 26 langs represneted. 08:57:48 topic: Language Technology Tools for Supporting the Multilingual Web 08:57:52 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 08:57:59 "What's the reason behind having 'opaque' URIs, and translating RDF predicates? They are merely identifiers, and as long as 'label' and 'definitions' have been properly translated, I see no reason of further complicating RDF vocabularies with multiple translations" 08:58:02 fsasaki: that would be mine :) 08:58:13 nwaltham_ has joined #mlwrome 08:58:13 ok 08:58:37 maybe just the first part ;) if it's too long 08:58:49 np, will be ok 08:59:19 Thierry: on the web ML pages, dictionaries, tools. not every document is avaliable in every language. When I access web in german or french I dont often get docs in other languages. Mono-lingual search 09:00:14 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 09:00:59 ... semantic resources are already available on the web. We have ML web, pages, resources but we want the Sem Web to run in combination with lang tech so we can annotate text 09:02:15 ... GICS - classIDs, Labels, these labels use non-standard formats etc. 09:03:19 ... towards ML linguistic Semantic Web so labels can be encoded in RDF using Lemon model - also want to mention Linguistic Linked Open Data. 09:04:21 ... annotate text avaliable in mutliple languages 1) take all labels, analyise, combine in semantic repos using Lemon and apply to running annotated text. Can also be stored in querable tool. 09:04:45 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 09:04:59 ... in one ontology you display suggestion for ML labels encoded in ontology 09:06:14 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 09:06:35 ... NooJ can be used to test NLP analysis of labels, difference is way natural langage can be expressed 09:08:25 ... need to harmonise and modify a label for NLP. Terminological expansion of labels provide taxonomies for preferred labels. From 1 label 5 labels can be generated annoted using LEMON and exported 09:09:25 ... triggering of ellipsis resolution to cross-lingual labels in other languages. Labels are expanded based on property of another language. 09:10:16 ... From this we discovered semantic annotation of web documents in many languages. 09:11:02 ... Text from spanish stock market, two simular taxonomy generates two annotations, both labels point to same concept but are textually different. 09:11:52 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 09:12:00 ... labels can be displayed in many other languages and allows for annotations in higher level languages. 09:12:18 ... needs to make sure these are compliant in terms of standardisation. 09:12:34 Topic: Question And Answer 09:14:00 question from lmatteis: "What's the reason behind having 'opaque' URIs, and translating RDF predicates? They are merely identifiers, and as long as 'label' and 'definitions' have been properly translated, I see no reason of further complicating RDF vocabularies with multiple translations" 09:14:09 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 09:14:37 Gordon: Several reasons for opqaue URIs. 1) Not opaque must be based on something, therefore is the label changes the URI cant change so its more confusing. 2) The favouring of any language over another is not good practice. 3) When translating property and class labels we're using opaque URIs 09:14:55 fsasaki: thanks! 09:15:07 np 09:15:59 Ivan: Linked Data community doesnt know "anything" you guys are doing. Until Larger LD community is aware of your work I dont see anything changing. For devs to take ML LD into account they need to be aware of your work 09:16:21 isn't Ivan the leader of w3c 09:16:34 ? 09:17:01 Assuncion: Ontology-Lexical WG is being proposed to be used for representing. Big countires investing in LD are english speaking and are not immediately interested in ML LD. 09:17:19 lmatteis, ivan is the semantic web activity lead at w3c 09:17:39 ... From SW perspective we need a road-map to push these ML issues. White Paper for community addressing these issues. 09:18:31 Ivan: W3C working groups are not suited for this. For example schema.org represents vocabs that are used we cant ignore them. Need to try to get the authors of schema.org to think about ML data 09:18:47 ... labelling and documentation in ML form would be a huge step 09:18:50 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 09:19:03 Jose: I agree with your point, hence cataglog of patterns has been produced. 09:19:32 ... need to educate hence BP practices for ML LOD 09:19:35 Monica has joined #mlwrome 09:20:35 Asuncion: Trying to analysie how languages are used and how these lingusitic choices are applied to data sets 09:22:06 chaals: annotating other peoples vocabs are socially difficult. Opauque URIs avoid having a language bias? No, the bias exists in the model, opaque URIs hide this from the top level view. We should be publishing annotations on other peoples vocabs that are broken 09:23:42 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 09:23:56 jose: in the case of annotating and translating the label that you want. 09:24:16 nwaltham_ has joined #mlwrome 09:24:16 ... labelsforall.info simular to prefix.cc for label and translation recording. 09:26:18 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 09:26:49 Q: Different communities I second Ivans views. In terms of ML-LOD cloud, when someone asks where is ML-ism? A URI is a resource that can be in many languagues. Dimensions, Peter S takes of TB of LOD. Many people talk in terms of one record. In ML-LOD 1) concept 09:27:51 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 09:28:14 r12a: Aim of workshop is not just for talks but to get people together networking to move things forward. 09:29:51 christian L: 2 things 1- how far does the work you are driving / continuing, effect the content authors. user cataloging, etc. Also how far is the reviewing activity considered a general reviewers toolkit 09:30:33 Peter: no direct connections to author services, everything is translated, we're just proof reading. IN being efficent we work with coded data and catagloging. 09:31:49 thierry: our work has implication on labels, taxonomies, in terms of impact important we provide impact to provide reocmmendation to change terminology to make it more applicable. 09:32:23 q: relation between work of the speakers and repositories like free-base? 09:34:07 Feiyu: instance of freebase can be used as a kind of interlingual can be really useful for ML-LOD 09:34:19 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:03:38 scribe: philr 10:03:41 topic: Users 10:03:51 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:04:24 Pat Kane (Verisgn) 10:04:39 Internationalized Domain Names 10:05:07 nwaltham_ has joined #mlwrome 10:05:07 topic: Internationalized Domain Names - pat kane 10:05:16 ...focus on end users 10:05:17 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:05:36 ...Users want to use their own scripts 10:06:06 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:06:41 ...growth in Asia Pacific driving non-English domain names and urls 10:07:25 ...1m+ international domain names registered in first six months 10:07:25 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:07:40 ...50% cjk idiographs 10:08:10 ...Armenian scripts under-served 10:08:37 ...major browsers handle idm's quite well 10:09:17 ...email addresses used a lot as identifiers in log in's 10:10:01 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:10:13 ...What's hindering domain registrations? greater user awareness, registrar's 10:10:37 ...better mobile browser support, management tools 10:11:32 ...results in a lack of trust (intent for a user to register) 10:12:13 ...users want full idn support 10:12:31 ...lack of ubiquity an issue 10:13:05 ...idn's are second class domains, users are suspicious of them 10:13:53 ...not comfortable with idn.ascii 10:14:21 ...SME's in China are more open to idn.idn 10:14:30 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:15:56 ...5 key insights: more utility needed, initial resistance to adoption, translation preferences, 10:16:10 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:17:11 ...moderate interest in registration and registrar channel expectations. 10:17:28 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:17:36 Chinese want idn.idn not idn.ascii 10:18:10 In India respondents do not visit idn.idn 10:18:36 In Japan comfortable with ascii.ascii 10:18:42 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:19:13 Korea more pationate about idn.idn 10:20:13 Need multi-disciplinary groups to push adoption 10:20:23 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:20:37 labra has joined #mlwrome 10:21:27 ...Key roles: Registries, Registrar's, content creators, application developers, Governments and businesses 10:21:36 ...and standards organisations 10:22:06 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:22:13 ...circle of dependency: adoption -- ubiquity 10:22:55 ...change ecosystem to enhance user experience 10:23:01 ...ubiquity drives trust 10:24:23 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:24:23 ...ubiquity means not just desktop but also mobile 10:24:39 ...mobile applications are much less capable of handling IDN's 10:26:05 next speaker, Richard Ishida who is a late change to the programme 10:26:20 topic: What's in a name? 10:26:40 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:27:08 Richard Ishida concerned about data and data formats: specifically people's names 10:27:35 ...web sites usually ask for "first" and "last"name 10:28:11 ...Use "given" and "family" name 10:28:20 ...names are more complicated than we generally think 10:28:20 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:29:38 ...applications want to parse names and do things with them - e.g. in salutations, search and sorting 10:29:42 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:30:32 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:30:42 ...Björk's "surname" is actually her father's name 10:31:06 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:31:08 ..."bin" == son of 10:31:40 ...Mao Ze Dong - Ze == generational name 10:32:10 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:32:14 ...How you would address him depends on a lot of things 10:32:56 ...typically he would use a western name to make things easier for western people 10:33:26 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:33:35 ...multiple family names: given name plus two family names 10:34:21 ...father's name first, mother's name second - varies by country 10:34:43 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:34:59 ...Variant word forms indicating gender 10:35:13 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:36:14 ...how names are inherited varies 10:36:53 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:37:39 ...nicknames used often to help 10:37:39 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:38:01 ...written forms can be ambiguous 10:38:36 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:38:48 ...many asian names can be transcribed identically 10:39:30 Recommendation: ask people how you would like to be addessed 10:39:54 ...this topic needs a lot more work 10:40:06 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:40:35 ...need an authoritive guidence on the problems of handling names 10:41:31 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:42:07 next speaker, Sebastien Hellman (another late programme change) 10:42:55 topic: LOD2 Stack and the NLP2RDF project 10:43:01 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:43:21 ...LOD == Linked Open Data cloud 10:43:57 ...http://lod-cloud.net data sets published on the net 10:44:31 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:44:31 ...free, open and open licensed 10:44:43 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:45:33 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:47:12 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:48:12 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:49:56 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:50:52 sebastian going through the lod2 stack 10:51:26 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:52:18 philr has joined #mlwrome 10:52:18 now about NIF format 10:52:23 scribe: philr 10:52:52 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:53:50 ...linguistic LOD cloud 10:54:31 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:55:17 ...in NIF use fragment identifiers to address primary data 10:55:25 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 10:55:56 ...can query NIF components as a web service 10:56:44 ...OLiA: Vocabulary Module - mapping of over 50 Tagsets 10:57:41 ...NIF 2.0 plans - links to ITS 2.0, Lemon ontolgy, XPath uri scheme 10:57:41 ...NIF will be free and open 10:58:11 ...looking for contributors 10:59:03 next speaker: Fernando Serván of FAO 10:59:27 topic: Reorganizing Information in a Multilingual Website 11:00:11 Fernando Serván: FAO has presence in 82 country offices 11:00:22 ...uses 6 official EU languages 11:00:32 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 11:01:50 ...FAO users language primarily English followed by Spanish and French 11:02:26 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html fsasaki 11:02:54 ...currently reorganizing content to focus on decentralization and partnerships 11:03:26 ...need to accommodate locally generated content 11:05:36 ...Issues faced: a lot of unstructured content, web content, language versions do not match, no localized uri's, low reuse of content 11:06:43 ...lack of mono- and multilingual ontologies to drive navigation 11:07:13 I still don't understand the need of having opaque URIs or URIs with numbers instead of meaningful words. This isn't going to solve the multilingual issue, it's going to make it worst if anything. Because at least the URI is readable in english, instead of not readable at all 11:07:16 ...do have a stable geopolitical ontology 11:09:13 ...need to make best use of existing content, identify normative, use CMS-independant content, use MT (for Arabic and Chinese), better (intended) nderstand users 11:11:05 ...want to utilize standards and best practices: XLIFF, RDF, ITS 2.0, learn from translation workflows, get social - on-demand translation 11:11:55 ...allow users to vote for pages that should be translated 11:12:44 ...have a set of short term and more longer term goals 11:12:54 tadej has joined #mlwrome 11:13:38 ...want to prioritize for Chinese, Russian, etc. 11:14:25 final speaker of this session is Paula Shannon of Lionbridge 11:14:48 topic: The Globalization Penalty 11:15:08 Paula Shannon: International SEO 11:16:06 ...McKinsey - "Strong multinationals seem less healthy..." 11:16:56 ...Local firms in emerging markets succeed where Multinationals fail 11:17:54 ...Marketing defined as a key function 11:18:05 ...global means complexity which means cost 11:18:26 ...balance central vs local 11:18:37 ...The Consumer Decision Journey 11:19:20 ...written about in the Harvard Business Review 11:20:17 ...many people in teh digital age already know what they want to buy before they go to purchase it. 11:21:07 ...consumers in the digital age trust social marketing 11:21:43 ...push branding is irrelevant in the digital age 11:23:13 ...so how do you form your pre-purchase opinion? Search 11:25:07 ...changing the rules: The Global Customer Lifecycle 11:25:53 ...71% decide based on in-language search and peer recommendations 11:26:33 ...3 Biggest Problems: Traffic, Conversions, Management 11:27:59 ...when search is bad: "Can't find... won't buy" 11:30:00 ...Web Localization Maturity Model 11:31:05 philr_ has joined #mlwrome 11:31:35 ...scribe: philr_ 11:31:48 SEO localization generates 15-40% more traffic 11:32:22 ...increase search rankings and traffic 11:33:32 ...be in the top 3 search results by benchmarking against competitors 11:34:32 ...SEO optimised translations is an iterative process 11:35:01 ...look at baseline, keyword resaecrh, translation, QA, repeat 11:37:17 ...percolate keywords throughout content 11:37:37 ...analytics and reporting against competitors 11:38:43 ...it's not just Google. 6/10 popular social networks in China. 11:39:00 Yandex expanding out of Russia 11:39:36 ...pace of change is accelorating 11:40:15 ...global companies need to be hyper-local. 11:41:00 ...utilize local search term experts 11:42:20 ...Long Tail of search terms 11:44:05 ...Methods for executing multi-lingual Pay Per Click: MT, Local Offices, Human Translation, Localization and Optimisation 11:44:34 ...hosting 1.5 million pages for clients 11:46:00 Q&A 11:46:53 Reinhard Schaler: existing notion of "give up the illusion of control". 11:48:02 ...What is stopping the localization industry from handing over to the users? 11:49:49 Paula Shannon: Localization is teh step-child. No clear ROI. Localization has been a cost center thus focused on efficiency and cost reduction 11:50:39 Des Oates: Some companies are making steps to user empowerment 11:51:10 nwaltham_ has joined #mlwrome 11:51:23 ...Adobe has ceeded control of certain products to users 11:52:17 Paula Shannon: text enrichment can help 11:53:03 Fernando Serván: monitor demand, traffic analytics 11:53:13 ...demand to drive translation 11:53:59 Chaarles: when you prioritise translation on demand, how do you decide? 11:54:18 ...how do you balance those things? 11:54:30 ...your goal is to server existing users 11:55:06 Fernando: it is tricky, I agree but we are trying to understand users better 11:55:54 ...time will tell 11:56:34 Richard Ishida: In India most people are used to ascii.ascii. Yet small percentage of users speak English 11:56:51 ...should the market for IDN's be bigger? 11:57:22 Pat Kane: the biggest challenge is the number of languages and scripts 11:58:13 Globalization vs LOcalization vs Multiculturism 11:58:30 ...we ignore the multicultural component 11:58:58 ...Fernando mentioned using MT for Arabic and Chinese 11:59:25 ...these are difficult MT languages, do you have specific reasons for using MT for these languages? 11:59:47 Fernando: we know Spanish and French are easier 12:00:13 ...it is difficult to find the volume of translators for Chinese and Arabic 12:01:22 ...SDL: it's important to consider user feedback 12:02:46 Arle: Session ended, break for lunch. After lunch breakout into parallel breakout sessions. 12:05:10 rssagent generate draft minutes 12:05:33 rssagent, generate draft minutes 12:06:51 rssagent, draft minutes 12:07:53 rrsagent, draft minutes 12:07:53 I have made the request to generate http://www.w3.org/2013/03/13-mlwrome-minutes.html philr_ 13:35:48 daveL has joined #mlwrome 13:35:52 scrobe: daveL 13:35:57 scribe: DaveL 13:36:18 ... breakout sessions 13:36:19 Des Oates: 13:36:26 ... itnernational domain names, chaired by pat kane 13:36:47 .... best practice in multilingual linked open data - chaired dom jones 13:37:10 ... translation quality - charied by arle lommel 13:37:23 from floor 13:37:48 ... interest in translation quality on postediting as well as human translation 13:38:09 des: the ball is now over to audience to propose now toher topics 13:38:39 ... aim is the not just discuss but to propose action plans to deliver upon later 13:38:53 Des: some personal finding from workshop to date 13:39:49 ... input from content creators, advances using ITS2.0 from cocomore, also with joomla and use with xliff in drupal and dita 13:39:49 ... also real world use cases with spanish tax office 13:40:31 Des: the other big topic is multilignual search and ML SEO, including insughts form Paula 13:40:41 ... a big issue for adobe related to keyword and term management 13:45:03 nwaltham_ has joined #mlwrome 13:48:37 nwaltham___ has joined #mlwrome 13:51:31 daveL has joined #mlwrome 13:51:46 topic: deciding discussion topics for breakout sessions Chair: Des Oates – adobe Topic: selecting topics International Domain Names ML-LOD (Dom) Trans Quality (Arle) in library ~25 Standards – XLIFF, ITS, OASIS, W3C ~4 Crowdsourcing and non-market stratergies ~3 Names session (Richard) ~3 MLW-LOD breakout note Working doc at: http://goo.gl/Th2VA Topic: Ivan Herman: sem web tech at W3C, updated version LOD cloud Community needs more deployment, use cases, data and _linked_ data Underlying tech needs to be seen as stable, so people are not waiting for next bi thig, so W3C is not planning anything more to stack W3C won’t standardise things, but will rely on community groups, e.g. Open Annotation CG W3C want to extend this and perhaps host vocabs with stable URIs, with registry of meta-data. Not a value measure, just cataloguing meta-data and governance and version control of vocab – will include localisation quality Need better validation, RDF is not well suited to this, so need structural validation (schema-like) and quality validation – but how to validate a multilingual vocab – a question for this vocab Disconnect between LOD and non-LOD , e,g CSV, text files etc. We site developer use data, not linked data Reference London Open Data on the Web workshop Multilinguality in non-RDF data Topic: Gordon Dunsire: IFLA – similar to earlier presentation in plenary session Have a few trillion triples potentially, a large high quality collection Some translated, some not, and partial translations Have a ML disctionay for authoritative publication of pub categories, but not very accessible to end users or web developers Topic: Jose Labra: Practical ML LOD guidelines Naming guidelines: extrapolated from mono lingual guidelines Opaque URI raises some controversy Topic: Charles Neville: Tools need to be good Opaque URI not helpful Yandex is in schema.org, but only look at microdata rather than rdfa because it was easier, but now might be regretting this and rdfa might be better. But because of this, in Russia microdata is more common than microdata. At yanex, people tend to add label/meta-data in English, but it was a better process to do it in Russian – on the whole you perform better in your own English. Topic: Roberto Navigli: BabelNet – wide ML semantic network, with encyclopedic and lexiogrpahic from Wikipedia and Wordnet http://babelnet.org/ 6 languages cover, moving to 40+ 3 million synsets Planning to integrate babelnet into linguistic linked open data cloud Contribution to LOD: Make available in lemon, real large ML LOD example Topic: Haofen Wang: apex labs, china– data and knowledge management labs http://zhishi.me Chinese LOD (CLOD) – 8 million instances, 1 billion triples, chinese wikipedia, baike.com and baidu encyclopedia site Issues: need to use IRI, but limited by use of older browsers Naming resoruces, Wikipedia uses traditional Chinese rather than simplified Chinese Integrating with e-commerce sites 360buy, taobao and soc net weibo and dianping to motivate more open LOD data streams Align with schema.org Group Reports: Topic: Jose: ML-LOD group report Notes taken on google docs. Topics: Naming: descriptive vs opaque, depends on the use case, useful for both Labelling: should always have language tag Interlinks: sameAs and see also may not be useful in all cases, lexical/lingusitic resource interlinking not always the same as conceptual interlinking Will start community group on best practice in ML LOD Richard Ishida notes this is easy to set up and join Topic: Arle: translation quality group report Need to decouple production method from end use Source quality is an issues, not always the translators fault Quality is dependend on the step in workflow where it is used Expectation need to be clear Are existing metrics actually valid and reproducible? Some are academic and not useful for production Need some process metrics to track these. Ethnographic studies of posteditors. Additional factors, see slides. So what does multilingual web do to help, three points: Context, audeience and use – common methods for HT, MT and PR MT – but need to be broadened out beyondQT Launchpad (wee workshop tomorrow) Don’t reinvent the wheel, harmonise parallel work An ongoing effort needed perhaps centred on MLW community at W3C Topic: Pat Kane: International Domain Names group report It is an ecosystem problems, not just for W3C but other bodies, other voices needed also. Perhaps w3C working group could be a startingpoint Topic: David Filip: Standards group report Gabor, Ionannis, btryan, Yves, DF CMS-L10n roundtrip, term management, and harmonisation efforts Seemed to cover many issues in existing ITS2.0-XLIFF mapping, e.g. terminology (usage and forbidden) But do need some standardised API, connectors and brokers In terminology need a message broker. Especially in interactive scenario with multiple terminology systems in real time. Topic: Juan: Names group report Focus on 3 use cases: 1) Recognition, e.g. named entity recognition and resolution, focussed on person names, for MT, for search and also segmentation (over line boundaries) 2) Display: sorting names in lists. Contextual usage – formal, familial, full (postal), autocompletion, abbreviation (e.g. in paper author list), text to speech 3) Capturing names: transliteration, speech to text, input form – size, order labels Problems listed Propose perhaps define an ontology of names Topic: Reinhard Schaler - Crowdsourcing group report (Easyling and FAO) Discussed different scenarios, e.g. for commercial and for non-market/non-profit, people motivation and associated support systems Practical implementations: environments need to be easy to use, looking at Easyling, FAO and SOLAS Too few to set up a bigger group, but invite other to participate. Topic: Des wrap up and questions … asks group chairs to come on stage to take questions Question: Christian Liesk – why do we need an additional terminology related standard, can’t we reuse existing LOD mechanisms David F – agrees that linked data can helps but need specific support for terminology. This area also suffers from many poorly adopted standards so a new one might make sense. Christian: good to bring LOD, terminology communities together David F: agrees, standards harmonisation is key to going forward. But also there is a gap in the API level Felix: looks for more standardisation people and localisation companies in the ML LOD best practice group. Des: supports this call … asks about CMIS David Filip: yes, work Pedro: we are going to GALA, so if there is a clear message we will Ionannis: also visiting a industrial term working Dave: lets open group now, so we have a concrete URL to point people at Christian: how will names discussion advance Richard: no specific plans David: asks about locatives in names Richard: not addressed this yet, as there were more immediate use cases, nor inflection, or other context Dave: asks if I18n interest group is a good place for this Richard: community group can be more focussed, IG may reach more people but interest can be more focussed Felix: agree we need to think about where best to place this. But in all cases we need hero to drive it forward – the ML-LOD BP seems to have two to three Richard: +1 need committed driving person Des: wrap up there, thanks everyone Topic: Arle – closing remarks Slide will be available soon, by next week. Linked from programme page There is streaming video available already provided by FAO, and better quality lectures available from Video Lectures Report will be produced soon, based on scribes Thanks to sponsors, Verisign who support workshop and dinner, and QTLaunchpad, who will having, FAO for local support, DFKI and Neives, and Felix. Thanks to EC and their sponsorships, W3C for logistical home, programme committee, organising group, speakers, chairs and scribe (especially Felix). Funding for conference series comes from MLW-LT which finished end of year. Waiting to hear on further funding from EU projects, but if anyone has further opportunities for funding or is willing to host future events please talk to Felix or Arle.