Multilingual Hate Speech Detection: Innovations in Optimized Deep Learning for English and Arabic Hate Speech Detection

  • Hassan AL-Sukhani
  • , Qusay Bsoul
  • , Abdelrahman H. Elhawary
  • , Ziad M. Nasr
  • , Ahmed E. Mansour*
  • , Radwan M. Batyha
  • , Basma S. Alqadi
  • , Jehad Saad Alqurni
  • , Hayat Alfagham
  • , Magda M. Madbouly
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

This paper presents the development of a multilingual hate speech detection model that effectively processes and classifies content in both Arabic and English. The study leverages both traditional machine learning models, such as K-Nearest Neighbors (KNN), Naive Bayes, and Support Vector Machines (SVM), as well as advanced deep learning models, specifically Bi-directional Long Short-Term Memory (Bi-LSTM) networks. A key challenge addressed is the classification of mixed-language content, which is common on social media platforms in the MENA region. To enhance detection performance, preprocessing techniques were applied to the text data, and the Synthetic Minority Over-sampling Technique (SMOTE) was used to balance the dataset. The results show that the Bi-LSTM model outperformed traditional machine learning approaches, particularly in identifying hate speech across multiple languages. The proposed model demonstrates superior accuracy and robustness in handling mixed-language input, providing a more effective solution for real-world hate speech detection tasks.

Original languageEnglish
Article number205
JournalSN Computer Science
Volume6
Issue number3
DOIs
StatePublished - Mar 2025

Keywords

  • Class imbalance
  • Deep learning
  • Hate speech detection
  • Machine learning
  • Mixed Arabic and English training
  • Multilingual NLP
  • Natural language processing (NLP)
  • Sentiment analysis

Fingerprint

Dive into the research topics of 'Multilingual Hate Speech Detection: Innovations in Optimized Deep Learning for English and Arabic Hate Speech Detection'. Together they form a unique fingerprint.

Cite this