Crosslingual linking

From Best Practices for Multilingual Linked Open Data Community Group

Working document for the CL linking guidelines

Initial brainstorming

[Minutes of in initial brainstorming sessions in this Google doc]


Some relevant readings

[TBC]

  • Jorge Gracia, Elena Montiel-Ponsoda, Philipp Cimiano, Asuncion Gomez-Perez, Paul Buitelaar, and John McCrae. 2012. Challenges for the multilingual Web of Data. Journal of Web Semantics 11 (03 2012), 63–71. https://doi.org/10.1016/j.websem. 2011.09.001

Examples of linking at the sense level (translations):

  • Diego Moussallem, Mohamed Ahmed Sherif, Diego Esteves, Marcos Zampieri, and Axel-Cyrille Ngonga Ngomo. 2018. LIdioms: A Multilingual Linked Idioms Data Set. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA). https://aclanthology.org/L18-1392/

Examples of linking at lexical entry level:

  • Maud Ehrmann, Guillaume Jacquet, and Ralf Steinberger. 2016. JRC-Names: Multilingual entity name variants and titles as Linked Data. Semantic Web 8 (2016), 283–295.

https://doi.org/10.3233/SW-160228 [TO CROSS_CHECK]

Examples of linking at the lemma level [valid for monolingual resources, though; can be mentioned as an strategy not valid in a CL setting]

  • Francesco Mambrini and Marco Passarotti. 2019. Harmonizing Different Lemmatization Strategies for Building a Knowledge Base of Linguistic Resources for Latin. In Proceedings of the 13th Linguistic Annotation Workshop, pages 71–80, Florence, Italy. Association for Computational Linguistics.

Examples of indirect links (translation):

  • Maud Ehrmann, Francesco Cecconi, Daniele Vannella, John Philip McCrae, Philipp Cimiano, and Roberto Navigli. 2014. Representing Multilingual Data as Linked Data: the Case of BabelNet 2.0. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 401–408, Reykjavik, Iceland. European Language Resources Association (ELRA).

From telco on 20th January 2023

  • imagine several monolingual resources and want to link them (maybe not following the same models). Multilingual pair-wise linking? pivot language?
  • multilinguality might be considered important part of ontology; resources are usually built bottom-up: look at language and model resources - only describe language, not semantics/concepts; how to model something where all languages in general language can be taken as source/target without having bilingual directions for each combination; also needs to achieve linguistic felicity (WordNet, BabelNet, etc.) - BabelNet: rather coarse-grain relations between senses; we need more fine-grained - we should find ways to model this; linguistic felicity: Japanese distinctions (rice as cereal; rice as cooked rice; mutton/sheep in English) - Indonesian has the same distinction; in applications we frequently go to English (lose this distinction; only rice) and then going to target language is difficult (source info is lost with a pivot language English in between) - find a way not to lose information in crosslingual/multilingual mapping
  • CL can be modeled at different levels: label, lexical entry, sense, ontological level…
  • practical thing: set up a document explaining different options and pros/cons of representing multilingual/crosslingual data, how to do this technically and some examples of real solutions
  • the challenge is to interlink existing resource cross-lingually
  • focus more in the representation of the CL links than in the process of discovery the links (many automatic techniques there, as well as manual curation)
  • Focus should not be on RDF Generation (there are other guidelines for this). Instead focus on representation of bilingual links at different levels with pros and cons of each.
  • Vauquois Triangle shows different models of transfer for machine translation at different levels of semantic depth.