Explainable Classification Model for Android Malware Analysis Using API and Permission-Based Features

Nida Aslam; Irfan Ullah Khan; Salma Abdulrahman Bader; Aisha Alansari; Lama Abdullah Alaqeel; Razan Mohammed Khormy; Zahra Abdultawab AlKubaish; Tariq Hussain

doi:10.32604/cmc.2023.039721

Explainable Classification Model for Android Malware Analysis Using API and Permission-Based Features

Nida Aslam^*
, Irfan Ullah Khan
, Salma Abdulrahman Bader
, Aisha Alansari
, Lama Abdullah Alaqeel
, Razan Mohammed Khormy
, Zahra Abdultawab AlKubaish
, Tariq Hussain^*

^*Corresponding author for this work

Computer Science Department

Research output: Contribution to journal › Article › peer-review

11 Scopus citations

Abstract

One of the most widely used smartphone operating systems, Android, is vulnerable to cutting-edge malware that employs sophisticated logic. Such malware attacks could lead to the execution of unauthorized acts on the victims’ devices, stealing personal information and causing hardware damage. In previous studies, machine learning (ML) has shown its efficacy in detecting malware events and classifying their types. However, attackers are continuously developing more sophisticated methods to bypass detection. Therefore, up-to-date datasets must be utilized to implement proactive models for detecting malware events in Android mobile devices. Therefore, this study employed ML algorithms to classify Android applications into malware or goodware using permission and application programming interface (API)-based features from a recent dataset. To overcome the dataset imbalance issue, RandomOverSampler, synthetic minority oversampling with tomek links (SMOTETomek), and RandomUnderSampler were applied to the Dataset in different experiments. The results indicated that the extra tree (ET) classifier achieved the highest accuracy of 99.53% within an elapsed time of 0.0198 s in the experiment that utilized the RandomOverSampler technique. Furthermore, the explainable Artificial Intelligence (EAI) technique has been applied to add transparency to the high-performance ET classifier. The global explanation using the Shapely values indicated that the top three features contributing to the goodware class are: Ljava/net/URL;>openConnection, Landroid/location/LocationManager;->getLastKgoodwarewnLocation, and Vibrate. On the other hand, the top three features contributing to the malware class are Receive_Boot_Completed, Get_Tasks, and Kill_Background_Processes. It is believed that the proposed model can contribute to proactively detecting malware events in Android devices to reduce the number of victims and increase users’ trust.

Original language	English
Pages (from-to)	3167-3188
Number of pages	22
Journal	Computers, Materials and Continua
Volume	76
Issue number	3
DOIs	https://doi.org/10.32604/cmc.2023.039721
State	Published - 2023

Keywords

Android malware
cyber security
explainable artificial intelligence
machine learning
malware detection

Access to Document

10.32604/cmc.2023.039721

Cite this

@article{3feb2e94186f407991bb64c0f70f7fe4,

title = "Explainable Classification Model for Android Malware Analysis Using API and Permission-Based Features",

abstract = "One of the most widely used smartphone operating systems, Android, is vulnerable to cutting-edge malware that employs sophisticated logic. Such malware attacks could lead to the execution of unauthorized acts on the victims{\textquoteright} devices, stealing personal information and causing hardware damage. In previous studies, machine learning (ML) has shown its efficacy in detecting malware events and classifying their types. However, attackers are continuously developing more sophisticated methods to bypass detection. Therefore, up-to-date datasets must be utilized to implement proactive models for detecting malware events in Android mobile devices. Therefore, this study employed ML algorithms to classify Android applications into malware or goodware using permission and application programming interface (API)-based features from a recent dataset. To overcome the dataset imbalance issue, RandomOverSampler, synthetic minority oversampling with tomek links (SMOTETomek), and RandomUnderSampler were applied to the Dataset in different experiments. The results indicated that the extra tree (ET) classifier achieved the highest accuracy of 99.53\% within an elapsed time of 0.0198 s in the experiment that utilized the RandomOverSampler technique. Furthermore, the explainable Artificial Intelligence (EAI) technique has been applied to add transparency to the high-performance ET classifier. The global explanation using the Shapely values indicated that the top three features contributing to the goodware class are: Ljava/net/URL;>openConnection, Landroid/location/LocationManager;->getLastKgoodwarewnLocation, and Vibrate. On the other hand, the top three features contributing to the malware class are Receive\_Boot\_Completed, Get\_Tasks, and Kill\_Background\_Processes. It is believed that the proposed model can contribute to proactively detecting malware events in Android devices to reduce the number of victims and increase users{\textquoteright} trust.",

keywords = "Android malware, cyber security, explainable artificial intelligence, machine learning, malware detection",

author = "Nida Aslam and Khan, \{Irfan Ullah\} and Bader, \{Salma Abdulrahman\} and Aisha Alansari and Alaqeel, \{Lama Abdullah\} and Khormy, \{Razan Mohammed\} and AlKubaish, \{Zahra Abdultawab\} and Tariq Hussain",

year = "2023",

doi = "10.32604/cmc.2023.039721",

language = "English",

volume = "76",

pages = "3167--3188",

journal = "Computers, Materials and Continua",

issn = "1546-2218",

number = "3",

}

TY - JOUR

T1 - Explainable Classification Model for Android Malware Analysis Using API and Permission-Based Features

AU - Aslam, Nida

AU - Khan, Irfan Ullah

AU - Bader, Salma Abdulrahman

AU - Alansari, Aisha

AU - Alaqeel, Lama Abdullah

AU - Khormy, Razan Mohammed

AU - AlKubaish, Zahra Abdultawab

AU - Hussain, Tariq

PY - 2023

Y1 - 2023

N2 - One of the most widely used smartphone operating systems, Android, is vulnerable to cutting-edge malware that employs sophisticated logic. Such malware attacks could lead to the execution of unauthorized acts on the victims’ devices, stealing personal information and causing hardware damage. In previous studies, machine learning (ML) has shown its efficacy in detecting malware events and classifying their types. However, attackers are continuously developing more sophisticated methods to bypass detection. Therefore, up-to-date datasets must be utilized to implement proactive models for detecting malware events in Android mobile devices. Therefore, this study employed ML algorithms to classify Android applications into malware or goodware using permission and application programming interface (API)-based features from a recent dataset. To overcome the dataset imbalance issue, RandomOverSampler, synthetic minority oversampling with tomek links (SMOTETomek), and RandomUnderSampler were applied to the Dataset in different experiments. The results indicated that the extra tree (ET) classifier achieved the highest accuracy of 99.53% within an elapsed time of 0.0198 s in the experiment that utilized the RandomOverSampler technique. Furthermore, the explainable Artificial Intelligence (EAI) technique has been applied to add transparency to the high-performance ET classifier. The global explanation using the Shapely values indicated that the top three features contributing to the goodware class are: Ljava/net/URL;>openConnection, Landroid/location/LocationManager;->getLastKgoodwarewnLocation, and Vibrate. On the other hand, the top three features contributing to the malware class are Receive_Boot_Completed, Get_Tasks, and Kill_Background_Processes. It is believed that the proposed model can contribute to proactively detecting malware events in Android devices to reduce the number of victims and increase users’ trust.

AB - One of the most widely used smartphone operating systems, Android, is vulnerable to cutting-edge malware that employs sophisticated logic. Such malware attacks could lead to the execution of unauthorized acts on the victims’ devices, stealing personal information and causing hardware damage. In previous studies, machine learning (ML) has shown its efficacy in detecting malware events and classifying their types. However, attackers are continuously developing more sophisticated methods to bypass detection. Therefore, up-to-date datasets must be utilized to implement proactive models for detecting malware events in Android mobile devices. Therefore, this study employed ML algorithms to classify Android applications into malware or goodware using permission and application programming interface (API)-based features from a recent dataset. To overcome the dataset imbalance issue, RandomOverSampler, synthetic minority oversampling with tomek links (SMOTETomek), and RandomUnderSampler were applied to the Dataset in different experiments. The results indicated that the extra tree (ET) classifier achieved the highest accuracy of 99.53% within an elapsed time of 0.0198 s in the experiment that utilized the RandomOverSampler technique. Furthermore, the explainable Artificial Intelligence (EAI) technique has been applied to add transparency to the high-performance ET classifier. The global explanation using the Shapely values indicated that the top three features contributing to the goodware class are: Ljava/net/URL;>openConnection, Landroid/location/LocationManager;->getLastKgoodwarewnLocation, and Vibrate. On the other hand, the top three features contributing to the malware class are Receive_Boot_Completed, Get_Tasks, and Kill_Background_Processes. It is believed that the proposed model can contribute to proactively detecting malware events in Android devices to reduce the number of victims and increase users’ trust.

KW - Android malware

KW - cyber security

KW - explainable artificial intelligence

KW - machine learning

KW - malware detection

UR - https://www.scopus.com/pages/publications/85174396565

U2 - 10.32604/cmc.2023.039721

DO - 10.32604/cmc.2023.039721

M3 - Article

AN - SCOPUS:85174396565

SN - 1546-2218

VL - 76

SP - 3167

EP - 3188

JO - Computers, Materials and Continua

JF - Computers, Materials and Continua

IS - 3

ER -

Explainable Classification Model for Android Malware Analysis Using API and Permission-Based Features

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

New Findings in Technology Described from Imam Abdulrahman Bin Faisal University (Explainable Classification Model for Android Malware Analysis Using Api and Permission-based Features)

Cite this

Explainable Classification Model for Android Malware Analysis Using API and Permission-Based Features

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Press/Media

New Findings in Technology Described from Imam Abdulrahman Bin Faisal University (Explainable Classification Model for Android Malware Analysis Using Api and Permission-based Features)

Cite this