TY - JOUR
T1 - AMachine Learning Approach to Cyberbullying Detection in Arabic Tweets
AU - Musleh, Dhiaa
AU - Rahman, Atta
AU - Alkherallah, Mohammed Abbas
AU - Al-Bohassan, Menhal Kamel
AU - Alawami, Mustafa Mohammed
AU - Alsebaa, Hayder Ali
AU - Alnemer, Jawad Ali
AU - Al-Mutairi, Ghazi Fayez
AU - Aldossary, May Issa
AU - Aldowaihi, Dalal A.
AU - Alhaidari, Fahd
N1 - Publisher Copyright:
© 2024 Tech Science Press. All rights reserved.
PY - 2024
Y1 - 2024
N2 - With the rapid growth of internet usage, a new situation has been created that enables practicing bullying. Cyberbullying has increased over the past decade, and it has the same adverse effects as face-to-face bullying, like anger, sadness, anxiety, and fear.With the anonymity people get on the internet, they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study. This study presents a thorough background of cyberbullying and the techniques used to collect, preprocess, and analyze the datasets. Moreover, a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages, and it was deduced that there is significant room for improvement in the Arabic language. As a result, the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing (NLP) for the classification of Arabic datasets duly collected from Twitter (also known as X). In this regard, support vector machine (SVM), Naïve Bayes (NB), Random Forest (RF), Logistic regression (LR), Bootstrap aggregating (Bagging), Gradient Boosting (GBoost), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost) were shortlisted and investigated due to their effectiveness in the similar problems. Finally, the scheme was evaluated by well-known performance measures like accuracy, precision, Recall, and F1-score. Consequently, XGBoost exhibited the best performance with 89.95% accuracy, which is promising compared to the state-of-the-art.
AB - With the rapid growth of internet usage, a new situation has been created that enables practicing bullying. Cyberbullying has increased over the past decade, and it has the same adverse effects as face-to-face bullying, like anger, sadness, anxiety, and fear.With the anonymity people get on the internet, they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study. This study presents a thorough background of cyberbullying and the techniques used to collect, preprocess, and analyze the datasets. Moreover, a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages, and it was deduced that there is significant room for improvement in the Arabic language. As a result, the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing (NLP) for the classification of Arabic datasets duly collected from Twitter (also known as X). In this regard, support vector machine (SVM), Naïve Bayes (NB), Random Forest (RF), Logistic regression (LR), Bootstrap aggregating (Bagging), Gradient Boosting (GBoost), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost) were shortlisted and investigated due to their effectiveness in the similar problems. Finally, the scheme was evaluated by well-known performance measures like accuracy, precision, Recall, and F1-score. Consequently, XGBoost exhibited the best performance with 89.95% accuracy, which is promising compared to the state-of-the-art.
KW - Arabic tweets
KW - cyberbullying
KW - ensemble learning
KW - NLP
KW - Supervised machine learning
UR - https://www.scopus.com/pages/publications/85200437897
U2 - 10.32604/cmc.2024.048003
DO - 10.32604/cmc.2024.048003
M3 - Article
AN - SCOPUS:85200437897
SN - 1546-2218
VL - 80
SP - 1033
EP - 1054
JO - Computers, Materials and Continua
JF - Computers, Materials and Continua
IS - 1
ER -