Academic Profile

I am a research fellow at the Data Science Institute (DSI) and Adjunct Lecturer at the School of Computer Science at the National University of Ireland Galway (NUI Galway). I am working in the Unit for Natural Language Processing (UNLP) under the supervision of Dr Paul Buitelaar, where my main research topic focuses on terminology and knowledge graph injection into neural machine translation architecture. Recently, I am also following the work on dialogue systems and natural language generation with multi-modal data. I am primarily funded by the nationwide Insight SFI Research Centre for Data Analytics, which is also hosted within DSI.

Interests: • Natural Language Processing • Machine Translation • Dialogue Systems • Knowledge Graphs

Activities

Co-supervision of PhD and Master Students

2017-

Teaching - as part of the MSc in Computer Science @NUI Galway - Data Analytics and AI.

2016-
  • Advanced Natural Language Processing (2018/2019 - )
  • Introduction to Natural Language Processing (2016/2017, 2017/2018)

Event Organisation

  • Organising Committee ESSLLI 2022
  • Workshop on Multilingualism at the intersection of Knowledge Bases and Machine Translation (MomenT):
    @LREC 2018 (proceedings), @MT-Summit 2019 (proceedings)
  • Local organiser of Language, Data and Knowledge (LDK) 2017, Galway, Ireland
  • Local organiser ESSLLI 2011

Program Committee

  • Member of program committee for different conferences and workshops, including ACL, EMNLP, LREC, ECAI, EAMT, MT-SUMMIT, WMT, ISWC, ESWC.

Invited Talks

  • Meet Central Europe (MCE 2018) on The Neural Age of Machine Translation. Budapest, Hungary.
  • Translation Technology Terminology Conference (TTT-2014) on Statistical Machine Translation and Terminology. Bled, Slovenia
  • Translation Technology Terminology Conference (TTT-2013) on Statistical Machine Translation. Zadar, Croatia.

Projects

NURS - The project focused on neural machine translation (NMT) for under-resourced scenarios, i.e. languages or technical domains, using sequence-to-sequence models.
ELEXIS - The ELEXIS project benefits from the expertise of some of the top experts in the fields of lexicography, linguistics and natural language processing, who agreed to share their experience and contribute their efforts to the success of the project.
MixedEmotions - MixedEmotions developed innovative multilingual multi-modal Big Data analytics applications to analyse a more complete emotional profile of user behaviour using data from mixed input channels.
EuroSentiment - The project aimed to develop a large shared data pool for language resources meant to be used by sentiment analysis systems, in order to bundle together scattered resources.
LIDER - The project’s mission was to provide the basis for the creation of a Linguistic Linked Data cloud that can support content analytics tasks of unstructured multilingual cross-media content.
Monnet - The Monnet project provided a semantics-based solution for accessing information across language barriers.

Demos

Marvin - a conversational chatbot with major depression disorder detection.
GrumpyBot - a sequence to sequence deep learning chatbot using linked data.
IRIS - an SMT/NMT system translating from English to (less-resourced) Irish and v.v.
ASISTENT - an SMT/NMT system translating between English and morphological rich South Slavic languages.
Insight META System - an SMT system accessible through an API request, supporting all official European Union languages.
OTTO - an SMT system for multilingual enhancement of ontologies. (CURRENTLY OFFLINE)
TeTra - a system for extracting and translating specific vocabulary (CURRENTLY OFFLINE)

Resources

BitterCorpus – Bilingual IT Terminology Annotated Corpus - annotated corpora for the evaluation of monolingual and bilingual domain-specific term extraction (with HLT FBK).
PE²rr - PostEdited and ERRor annotated corpus covers machine translations, their post-edited versions and error annotations of the performed edit-operations.
Polylingual WordNet - extends WordNet for 23 languages by automatic translation and is released as both OntoLex JSON-LD as well as in the Global WordNet LMF. This resource is available for re-use under the Creative Commons Attribution 4.0 License.

Publications

Selected publications, please also visit my Google scholar page (BitTex).

  • Enhancing statistical machine translation with bilingual terminology in a CAT environment
  • Mihael Arcan, Marco Turchi, Sara Tonelli, Paul Buitelaar
    Proceedings of the 11th Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2014)
  • Knowledge Portability with Semantic Expansion of Ontology Labels
  • Mihael Arcan, Marco Turchi, Paul Buitelaar
    Proceedings of the Association for Computational Linguistics (ACL-2015)
  • Expanding wordnets to new languages with multilingual sense disambiguation
  • Mihael Arcan, John P McCrae, Paul Buitelaar
    International Conference on Computational Linguistics (COLING-2016)
  • Leveraging bilingual terminology to improve machine translation in a CAT environment
  • Mihael Arcan, Marco Turchi, Sara Tonelli, Paul Buitelaar
    Natural Language Engineering 23(5)