Multilingual Hate Speech Detection: Innovations in Optimized Deep Learning for English and Arabic Hate Speech Detection

Hassan AL-Sukhani; Qusay Bsoul; Abdelrahman H. Elhawary; Ziad M. Nasr; Ahmed E. Mansour; Radwan M. Batyha; Basma S. Alqadi; Jehad Saad Alqurni; Hayat Alfagham; Magda M. Madbouly

doi:10.1007/s42979-025-03741-8

Multilingual Hate Speech Detection: Innovations in Optimized Deep Learning for English and Arabic Hate Speech Detection

Hassan AL-Sukhani
, Qusay Bsoul
, Abdelrahman H. Elhawary
, Ziad M. Nasr
, Ahmed E. Mansour^*
, Radwan M. Batyha
, Basma S. Alqadi
, Jehad Saad Alqurni
, Hayat Alfagham
, Magda M. Madbouly

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

This paper presents the development of a multilingual hate speech detection model that effectively processes and classifies content in both Arabic and English. The study leverages both traditional machine learning models, such as K-Nearest Neighbors (KNN), Naive Bayes, and Support Vector Machines (SVM), as well as advanced deep learning models, specifically Bi-directional Long Short-Term Memory (Bi-LSTM) networks. A key challenge addressed is the classification of mixed-language content, which is common on social media platforms in the MENA region. To enhance detection performance, preprocessing techniques were applied to the text data, and the Synthetic Minority Over-sampling Technique (SMOTE) was used to balance the dataset. The results show that the Bi-LSTM model outperformed traditional machine learning approaches, particularly in identifying hate speech across multiple languages. The proposed model demonstrates superior accuracy and robustness in handling mixed-language input, providing a more effective solution for real-world hate speech detection tasks.

Original language	English
Article number	205
Journal	SN Computer Science
Volume	6
Issue number	3
DOIs	https://doi.org/10.1007/s42979-025-03741-8
State	Published - Mar 2025

Keywords

Class imbalance
Deep learning
Hate speech detection
Machine learning
Mixed Arabic and English training
Multilingual NLP
Natural language processing (NLP)
Sentiment analysis

Access to Document

10.1007/s42979-025-03741-8

Cite this

@article{466c3828a969470ea948f970cc7662c5,

title = "Multilingual Hate Speech Detection: Innovations in Optimized Deep Learning for English and Arabic Hate Speech Detection",

abstract = "This paper presents the development of a multilingual hate speech detection model that effectively processes and classifies content in both Arabic and English. The study leverages both traditional machine learning models, such as K-Nearest Neighbors (KNN), Naive Bayes, and Support Vector Machines (SVM), as well as advanced deep learning models, specifically Bi-directional Long Short-Term Memory (Bi-LSTM) networks. A key challenge addressed is the classification of mixed-language content, which is common on social media platforms in the MENA region. To enhance detection performance, preprocessing techniques were applied to the text data, and the Synthetic Minority Over-sampling Technique (SMOTE) was used to balance the dataset. The results show that the Bi-LSTM model outperformed traditional machine learning approaches, particularly in identifying hate speech across multiple languages. The proposed model demonstrates superior accuracy and robustness in handling mixed-language input, providing a more effective solution for real-world hate speech detection tasks.",

keywords = "Class imbalance, Deep learning, Hate speech detection, Machine learning, Mixed Arabic and English training, Multilingual NLP, Natural language processing (NLP), Sentiment analysis",

author = "Hassan AL-Sukhani and Qusay Bsoul and Elhawary, \{Abdelrahman H.\} and Nasr, \{Ziad M.\} and Mansour, \{Ahmed E.\} and Batyha, \{Radwan M.\} and Alqadi, \{Basma S.\} and Alqurni, \{Jehad Saad\} and Hayat Alfagham and Madbouly, \{Magda M.\}",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.",

year = "2025",

month = mar,

doi = "10.1007/s42979-025-03741-8",

language = "English",

volume = "6",

journal = "SN Computer Science",

issn = "2662-995X",

number = "3",

}

TY - JOUR

T1 - Multilingual Hate Speech Detection

T2 - Innovations in Optimized Deep Learning for English and Arabic Hate Speech Detection

AU - AL-Sukhani, Hassan

AU - Bsoul, Qusay

AU - Elhawary, Abdelrahman H.

AU - Nasr, Ziad M.

AU - Mansour, Ahmed E.

AU - Batyha, Radwan M.

AU - Alqadi, Basma S.

AU - Alqurni, Jehad Saad

AU - Alfagham, Hayat

AU - Madbouly, Magda M.

N1 - Publisher Copyright: © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.

PY - 2025/3

Y1 - 2025/3

N2 - This paper presents the development of a multilingual hate speech detection model that effectively processes and classifies content in both Arabic and English. The study leverages both traditional machine learning models, such as K-Nearest Neighbors (KNN), Naive Bayes, and Support Vector Machines (SVM), as well as advanced deep learning models, specifically Bi-directional Long Short-Term Memory (Bi-LSTM) networks. A key challenge addressed is the classification of mixed-language content, which is common on social media platforms in the MENA region. To enhance detection performance, preprocessing techniques were applied to the text data, and the Synthetic Minority Over-sampling Technique (SMOTE) was used to balance the dataset. The results show that the Bi-LSTM model outperformed traditional machine learning approaches, particularly in identifying hate speech across multiple languages. The proposed model demonstrates superior accuracy and robustness in handling mixed-language input, providing a more effective solution for real-world hate speech detection tasks.

AB - This paper presents the development of a multilingual hate speech detection model that effectively processes and classifies content in both Arabic and English. The study leverages both traditional machine learning models, such as K-Nearest Neighbors (KNN), Naive Bayes, and Support Vector Machines (SVM), as well as advanced deep learning models, specifically Bi-directional Long Short-Term Memory (Bi-LSTM) networks. A key challenge addressed is the classification of mixed-language content, which is common on social media platforms in the MENA region. To enhance detection performance, preprocessing techniques were applied to the text data, and the Synthetic Minority Over-sampling Technique (SMOTE) was used to balance the dataset. The results show that the Bi-LSTM model outperformed traditional machine learning approaches, particularly in identifying hate speech across multiple languages. The proposed model demonstrates superior accuracy and robustness in handling mixed-language input, providing a more effective solution for real-world hate speech detection tasks.

KW - Class imbalance

KW - Deep learning

KW - Hate speech detection

KW - Machine learning

KW - Mixed Arabic and English training

KW - Multilingual NLP

KW - Natural language processing (NLP)

KW - Sentiment analysis

UR - https://www.scopus.com/pages/publications/85218704805

U2 - 10.1007/s42979-025-03741-8

DO - 10.1007/s42979-025-03741-8

M3 - Article

AN - SCOPUS:85218704805

SN - 2662-995X

VL - 6

JO - SN Computer Science

JF - SN Computer Science

IS - 3

M1 - 205

ER -

Multilingual Hate Speech Detection: Innovations in Optimized Deep Learning for English and Arabic Hate Speech Detection

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this