Abstract
This paper presents the development of a multilingual hate speech detection model that effectively processes and classifies content in both Arabic and English. The study leverages both traditional machine learning models, such as K-Nearest Neighbors (KNN), Naive Bayes, and Support Vector Machines (SVM), as well as advanced deep learning models, specifically Bi-directional Long Short-Term Memory (Bi-LSTM) networks. A key challenge addressed is the classification of mixed-language content, which is common on social media platforms in the MENA region. To enhance detection performance, preprocessing techniques were applied to the text data, and the Synthetic Minority Over-sampling Technique (SMOTE) was used to balance the dataset. The results show that the Bi-LSTM model outperformed traditional machine learning approaches, particularly in identifying hate speech across multiple languages. The proposed model demonstrates superior accuracy and robustness in handling mixed-language input, providing a more effective solution for real-world hate speech detection tasks.
| Original language | English |
|---|---|
| Article number | 205 |
| Journal | SN Computer Science |
| Volume | 6 |
| Issue number | 3 |
| DOIs | |
| State | Published - Mar 2025 |
Keywords
- Class imbalance
- Deep learning
- Hate speech detection
- Machine learning
- Mixed Arabic and English training
- Multilingual NLP
- Natural language processing (NLP)
- Sentiment analysis