Accéder directement au contenu Accéder directement à la navigation
Communication dans un congrès

Hybrid Emoji-Based Masked Language Models for Zero-Shot Abusive Language Detection

Michele Corazza 1 Stefano Menini 2 Elena Cabrio 3 Sara Tonelli 2 Serena Villata 3
3 WIMMICS - Web-Instrumented Man-Machine Interactions, Communities and Semantics
CRISAM - Inria Sophia Antipolis - Méditerranée , Laboratoire I3S - SPARKS - Scalable and Pervasive softwARe and Knowledge Systems
Abstract : Recent studies have demonstrated the effectiveness of cross-lingual language model pre-training on different NLP tasks, such as natural language inference and machine translation. In our work, we test this approach on social media data, which are particularly challenging to process within this framework, since the limited length of the textual messages and the irregularity of the language make it harder to learn meaningful encodings. More specifically, we propose a hybrid emoji-based Masked Language Model (MLM) to leverage the common information conveyed by emo-jis across different languages and improve the learned cross-lingual representation of short text messages, with the goal to perform zero-shot abusive language detection. We compare the results obtained with the original MLM to the ones obtained by our method, showing improved performance on German, Italian and Spanish.
Type de document :
Communication dans un congrès
Liste complète des métadonnées

Littérature citée [40 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-02972203
Contributeur : Serena Villata <>
Soumis le : mardi 20 octobre 2020 - 11:28:47
Dernière modification le : mercredi 21 octobre 2020 - 03:40:31

Fichier

Emoji_Based_Hate_Speech_EMNLP_...
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-02972203, version 1

Citation

Michele Corazza, Stefano Menini, Elena Cabrio, Sara Tonelli, Serena Villata. Hybrid Emoji-Based Masked Language Models for Zero-Shot Abusive Language Detection. Findings of ACL: EMNLP 2020, Nov 2020, Virtual, France. ⟨hal-02972203⟩

Partager

Métriques

Consultations de la notice

42

Téléchargements de fichiers

74