AMachine Learning Approach to Cyberbullying Detection in Arabic Tweets

Dhiaa Musleh; Atta Rahman; Mohammed Abbas Alkherallah; Menhal Kamel Al-Bohassan; Mustafa Mohammed Alawami; Hayder Ali Alsebaa; Jawad Ali Alnemer; Ghazi Fayez Al-Mutairi; May Issa Aldossary; Dalal A. Aldowaihi; Fahd Alhaidari

doi:10.32604/cmc.2024.048003

AMachine Learning Approach to Cyberbullying Detection in Arabic Tweets

Dhiaa Musleh
, Atta Rahman^*
, Mohammed Abbas Alkherallah
, Menhal Kamel Al-Bohassan
, Mustafa Mohammed Alawami
, Hayder Ali Alsebaa
, Jawad Ali Alnemer
, Ghazi Fayez Al-Mutairi
, May Issa Aldossary
, Dalal A. Aldowaihi
, Fahd Alhaidari

^*Corresponding author for this work

Imam Abdulrahman Bin Faisal University

Research output: Contribution to journal › Article › peer-review

14 Scopus citations

Abstract

With the rapid growth of internet usage, a new situation has been created that enables practicing bullying. Cyberbullying has increased over the past decade, and it has the same adverse effects as face-to-face bullying, like anger, sadness, anxiety, and fear.With the anonymity people get on the internet, they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study. This study presents a thorough background of cyberbullying and the techniques used to collect, preprocess, and analyze the datasets. Moreover, a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages, and it was deduced that there is significant room for improvement in the Arabic language. As a result, the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing (NLP) for the classification of Arabic datasets duly collected from Twitter (also known as X). In this regard, support vector machine (SVM), Naïve Bayes (NB), Random Forest (RF), Logistic regression (LR), Bootstrap aggregating (Bagging), Gradient Boosting (GBoost), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost) were shortlisted and investigated due to their effectiveness in the similar problems. Finally, the scheme was evaluated by well-known performance measures like accuracy, precision, Recall, and F1-score. Consequently, XGBoost exhibited the best performance with 89.95% accuracy, which is promising compared to the state-of-the-art.

Original language	English
Pages (from-to)	1033-1054
Number of pages	22
Journal	Computers, Materials and Continua
Volume	80
Issue number	1
DOIs	https://doi.org/10.32604/cmc.2024.048003
State	Published - 2024

Keywords

Arabic tweets
cyberbullying
ensemble learning
NLP
Supervised machine learning

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.32604/cmc.2024.048003

Cite this

@article{17ab9597d8b64d79a414ac74c42a0cc7,

title = "AMachine Learning Approach to Cyberbullying Detection in Arabic Tweets",

abstract = "With the rapid growth of internet usage, a new situation has been created that enables practicing bullying. Cyberbullying has increased over the past decade, and it has the same adverse effects as face-to-face bullying, like anger, sadness, anxiety, and fear.With the anonymity people get on the internet, they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study. This study presents a thorough background of cyberbullying and the techniques used to collect, preprocess, and analyze the datasets. Moreover, a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages, and it was deduced that there is significant room for improvement in the Arabic language. As a result, the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing (NLP) for the classification of Arabic datasets duly collected from Twitter (also known as X). In this regard, support vector machine (SVM), Na{\"i}ve Bayes (NB), Random Forest (RF), Logistic regression (LR), Bootstrap aggregating (Bagging), Gradient Boosting (GBoost), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost) were shortlisted and investigated due to their effectiveness in the similar problems. Finally, the scheme was evaluated by well-known performance measures like accuracy, precision, Recall, and F1-score. Consequently, XGBoost exhibited the best performance with 89.95\% accuracy, which is promising compared to the state-of-the-art.",

keywords = "Arabic tweets, cyberbullying, ensemble learning, NLP, Supervised machine learning",

author = "Dhiaa Musleh and Atta Rahman and Alkherallah, \{Mohammed Abbas\} and Al-Bohassan, \{Menhal Kamel\} and Alawami, \{Mustafa Mohammed\} and Alsebaa, \{Hayder Ali\} and Alnemer, \{Jawad Ali\} and Al-Mutairi, \{Ghazi Fayez\} and Aldossary, \{May Issa\} and Aldowaihi, \{Dalal A.\} and Fahd Alhaidari",

year = "2024",

doi = "10.32604/cmc.2024.048003",

language = "English",

volume = "80",

pages = "1033--1054",

journal = "Computers, Materials and Continua",

issn = "1546-2218",

number = "1",

}

TY - JOUR

T1 - AMachine Learning Approach to Cyberbullying Detection in Arabic Tweets

AU - Musleh, Dhiaa

AU - Rahman, Atta

AU - Alkherallah, Mohammed Abbas

AU - Al-Bohassan, Menhal Kamel

AU - Alawami, Mustafa Mohammed

AU - Alsebaa, Hayder Ali

AU - Alnemer, Jawad Ali

AU - Al-Mutairi, Ghazi Fayez

AU - Aldossary, May Issa

AU - Aldowaihi, Dalal A.

AU - Alhaidari, Fahd

PY - 2024

Y1 - 2024

N2 - With the rapid growth of internet usage, a new situation has been created that enables practicing bullying. Cyberbullying has increased over the past decade, and it has the same adverse effects as face-to-face bullying, like anger, sadness, anxiety, and fear.With the anonymity people get on the internet, they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study. This study presents a thorough background of cyberbullying and the techniques used to collect, preprocess, and analyze the datasets. Moreover, a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages, and it was deduced that there is significant room for improvement in the Arabic language. As a result, the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing (NLP) for the classification of Arabic datasets duly collected from Twitter (also known as X). In this regard, support vector machine (SVM), Naïve Bayes (NB), Random Forest (RF), Logistic regression (LR), Bootstrap aggregating (Bagging), Gradient Boosting (GBoost), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost) were shortlisted and investigated due to their effectiveness in the similar problems. Finally, the scheme was evaluated by well-known performance measures like accuracy, precision, Recall, and F1-score. Consequently, XGBoost exhibited the best performance with 89.95% accuracy, which is promising compared to the state-of-the-art.

AB - With the rapid growth of internet usage, a new situation has been created that enables practicing bullying. Cyberbullying has increased over the past decade, and it has the same adverse effects as face-to-face bullying, like anger, sadness, anxiety, and fear.With the anonymity people get on the internet, they tend to bemore aggressive and express their emotions freely without considering the effects,which can be a reason for the increase in cyberbullying and it is the main motive behind the current study. This study presents a thorough background of cyberbullying and the techniques used to collect, preprocess, and analyze the datasets. Moreover, a comprehensive review of the literature has been conducted to figure out research gaps and effective techniques and practices in cyberbullying detection in various languages, and it was deduced that there is significant room for improvement in the Arabic language. As a result, the current study focuses on the investigation of shortlisted machine learning algorithms in natural language processing (NLP) for the classification of Arabic datasets duly collected from Twitter (also known as X). In this regard, support vector machine (SVM), Naïve Bayes (NB), Random Forest (RF), Logistic regression (LR), Bootstrap aggregating (Bagging), Gradient Boosting (GBoost), Light Gradient Boosting Machine (LightGBM), Adaptive Boosting (AdaBoost), and eXtreme Gradient Boosting (XGBoost) were shortlisted and investigated due to their effectiveness in the similar problems. Finally, the scheme was evaluated by well-known performance measures like accuracy, precision, Recall, and F1-score. Consequently, XGBoost exhibited the best performance with 89.95% accuracy, which is promising compared to the state-of-the-art.

KW - Arabic tweets

KW - cyberbullying

KW - ensemble learning

KW - NLP

KW - Supervised machine learning

UR - https://www.scopus.com/pages/publications/85200437897

U2 - 10.32604/cmc.2024.048003

DO - 10.32604/cmc.2024.048003

M3 - Article

AN - SCOPUS:85200437897

SN - 1546-2218

VL - 80

SP - 1033

EP - 1054

JO - Computers, Materials and Continua

JF - Computers, Materials and Continua

IS - 1

ER -

AMachine Learning Approach to Cyberbullying Detection in Arabic Tweets

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Investigators at Imam Abdulrahman Bin Faisal University Report Findings in Bullying (A Machine Learning Approach To Cyberbullying Detection In Arabic Tweets)

Cite this

AMachine Learning Approach to Cyberbullying Detection in Arabic Tweets

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Press/Media

Investigators at Imam Abdulrahman Bin Faisal University Report Findings in Bullying (A Machine Learning Approach To Cyberbullying Detection In Arabic Tweets)

Cite this