Convolutional transform learning based fusion framework for scale invariant long term target detection and tracking in unmanned aerial vehicles

Fatma S. Alrayes; Nazir Ahmad; Asma Alshuhail; Menwa Alshammeri; Ali Alqazzaz; Hassan Alkhiri; Jehad Saad Alqurni; Yahia Said

doi:10.1038/s41598-025-09652-1

Convolutional transform learning based fusion framework for scale invariant long term target detection and tracking in unmanned aerial vehicles

Fatma S. Alrayes
, Nazir Ahmad
, Asma Alshuhail
, Menwa Alshammeri
, Ali Alqazzaz
, Hassan Alkhiri
, Jehad Saad Alqurni
, Yahia Said^*

^*Corresponding author for this work

Education Technologies Department

Research output: Contribution to journal › Article › peer-review

2 Scopus citations

Abstract

Unmanned aerial vehicles (UAVs) become increasingly available devices with extensive usage as environmental monitoring systems. With the benefit of higher mobility, UAVs are applied to fuel various significant uses in computer vision (CV), providing more effectiveness and accessibility than surveillance cameras with permanent camera view, angle, and scale. Nevertheless, owing to camera motion and composite environments, it is problematic to identify the UAV; conventional models frequently miss UAV detection and make false alarms. Drone-equipped cameras monitor objects at changing altitudes, leading to essential scale variants. The model increases targeted accuracy and decreases false positives using real-time data and machine learning (ML) methods. Its enormous applications range from military operations to urban planning and wildlife monitoring. Therefore, this study develops a novel long-term target detection and tracking model for unmanned aerial vehicles using a deep fusion-based convolutional transform learning (LTTDT–UAVDFCTL) model. The LTTDT–UAVDFCTL model presents a new model to improve the robustness and accuracy of target tracking and detection in scale-variant environments. At first, the presented LTTDT–UAVDFCTL technique performs image pre-processing by utilizing the median median-enhanced wiener filter (MEWF) technique to improve clarity and reduce noise. For object detection (OD), the highly accurate YOLOv8 technique is utilized, followed by feature extraction through a backbone deep fusion-based convolutional transform learning of VGG16, CapsNet, and EfficientNetB7 to capture both spatial and hierarchical features across varying scales. Moreover, the graph convolutional neural network (GCN) technique is employed for long-term target detection and tracking models. Finally, the hybrid nonlinear whale optimization algorithm with sine cosine (SCWOA) is implemented for the optimum choice of the hyperparameters involved in the GCN technique. The experimental study of the LTTDT–UAVDFCTL approach is performed under the VisDrone dataset. The performance validation of the LTTDT–UAVDFCTL approach portrayed a superior mAP value of 80.13% over existing models.

Original language	English
Article number	28248
Journal	Scientific Reports
Volume	15
Issue number	1
DOIs	https://doi.org/10.1038/s41598-025-09652-1
State	Published - Dec 2025

Keywords

Computer vision
Convolutional transform learning
Fusion model
Long-term target detection
Scale variations environment
Unmanned aerial vehicles

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1038/s41598-025-09652-1

Cite this

@article{6103cd642a064578ab37194f94f17c4f,

title = "Convolutional transform learning based fusion framework for scale invariant long term target detection and tracking in unmanned aerial vehicles",

abstract = "Unmanned aerial vehicles (UAVs) become increasingly available devices with extensive usage as environmental monitoring systems. With the benefit of higher mobility, UAVs are applied to fuel various significant uses in computer vision (CV), providing more effectiveness and accessibility than surveillance cameras with permanent camera view, angle, and scale. Nevertheless, owing to camera motion and composite environments, it is problematic to identify the UAV; conventional models frequently miss UAV detection and make false alarms. Drone-equipped cameras monitor objects at changing altitudes, leading to essential scale variants. The model increases targeted accuracy and decreases false positives using real-time data and machine learning (ML) methods. Its enormous applications range from military operations to urban planning and wildlife monitoring. Therefore, this study develops a novel long-term target detection and tracking model for unmanned aerial vehicles using a deep fusion-based convolutional transform learning (LTTDT–UAVDFCTL) model. The LTTDT–UAVDFCTL model presents a new model to improve the robustness and accuracy of target tracking and detection in scale-variant environments. At first, the presented LTTDT–UAVDFCTL technique performs image pre-processing by utilizing the median median-enhanced wiener filter (MEWF) technique to improve clarity and reduce noise. For object detection (OD), the highly accurate YOLOv8 technique is utilized, followed by feature extraction through a backbone deep fusion-based convolutional transform learning of VGG16, CapsNet, and EfficientNetB7 to capture both spatial and hierarchical features across varying scales. Moreover, the graph convolutional neural network (GCN) technique is employed for long-term target detection and tracking models. Finally, the hybrid nonlinear whale optimization algorithm with sine cosine (SCWOA) is implemented for the optimum choice of the hyperparameters involved in the GCN technique. The experimental study of the LTTDT–UAVDFCTL approach is performed under the VisDrone dataset. The performance validation of the LTTDT–UAVDFCTL approach portrayed a superior mAP value of 80.13\% over existing models.",

keywords = "Computer vision, Convolutional transform learning, Fusion model, Long-term target detection, Scale variations environment, Unmanned aerial vehicles",

author = "Alrayes, \{Fatma S.\} and Nazir Ahmad and Asma Alshuhail and Menwa Alshammeri and Ali Alqazzaz and Hassan Alkhiri and Alqurni, \{Jehad Saad\} and Yahia Said",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2025.",

year = "2025",

month = dec,

doi = "10.1038/s41598-025-09652-1",

language = "English",

volume = "15",

journal = "Scientific Reports",

issn = "2045-2322",

number = "1",

}

TY - JOUR

T1 - Convolutional transform learning based fusion framework for scale invariant long term target detection and tracking in unmanned aerial vehicles

AU - Alrayes, Fatma S.

AU - Ahmad, Nazir

AU - Alshuhail, Asma

AU - Alshammeri, Menwa

AU - Alqazzaz, Ali

AU - Alkhiri, Hassan

AU - Alqurni, Jehad Saad

AU - Said, Yahia

N1 - Publisher Copyright: © The Author(s) 2025.

PY - 2025/12

Y1 - 2025/12

N2 - Unmanned aerial vehicles (UAVs) become increasingly available devices with extensive usage as environmental monitoring systems. With the benefit of higher mobility, UAVs are applied to fuel various significant uses in computer vision (CV), providing more effectiveness and accessibility than surveillance cameras with permanent camera view, angle, and scale. Nevertheless, owing to camera motion and composite environments, it is problematic to identify the UAV; conventional models frequently miss UAV detection and make false alarms. Drone-equipped cameras monitor objects at changing altitudes, leading to essential scale variants. The model increases targeted accuracy and decreases false positives using real-time data and machine learning (ML) methods. Its enormous applications range from military operations to urban planning and wildlife monitoring. Therefore, this study develops a novel long-term target detection and tracking model for unmanned aerial vehicles using a deep fusion-based convolutional transform learning (LTTDT–UAVDFCTL) model. The LTTDT–UAVDFCTL model presents a new model to improve the robustness and accuracy of target tracking and detection in scale-variant environments. At first, the presented LTTDT–UAVDFCTL technique performs image pre-processing by utilizing the median median-enhanced wiener filter (MEWF) technique to improve clarity and reduce noise. For object detection (OD), the highly accurate YOLOv8 technique is utilized, followed by feature extraction through a backbone deep fusion-based convolutional transform learning of VGG16, CapsNet, and EfficientNetB7 to capture both spatial and hierarchical features across varying scales. Moreover, the graph convolutional neural network (GCN) technique is employed for long-term target detection and tracking models. Finally, the hybrid nonlinear whale optimization algorithm with sine cosine (SCWOA) is implemented for the optimum choice of the hyperparameters involved in the GCN technique. The experimental study of the LTTDT–UAVDFCTL approach is performed under the VisDrone dataset. The performance validation of the LTTDT–UAVDFCTL approach portrayed a superior mAP value of 80.13% over existing models.

AB - Unmanned aerial vehicles (UAVs) become increasingly available devices with extensive usage as environmental monitoring systems. With the benefit of higher mobility, UAVs are applied to fuel various significant uses in computer vision (CV), providing more effectiveness and accessibility than surveillance cameras with permanent camera view, angle, and scale. Nevertheless, owing to camera motion and composite environments, it is problematic to identify the UAV; conventional models frequently miss UAV detection and make false alarms. Drone-equipped cameras monitor objects at changing altitudes, leading to essential scale variants. The model increases targeted accuracy and decreases false positives using real-time data and machine learning (ML) methods. Its enormous applications range from military operations to urban planning and wildlife monitoring. Therefore, this study develops a novel long-term target detection and tracking model for unmanned aerial vehicles using a deep fusion-based convolutional transform learning (LTTDT–UAVDFCTL) model. The LTTDT–UAVDFCTL model presents a new model to improve the robustness and accuracy of target tracking and detection in scale-variant environments. At first, the presented LTTDT–UAVDFCTL technique performs image pre-processing by utilizing the median median-enhanced wiener filter (MEWF) technique to improve clarity and reduce noise. For object detection (OD), the highly accurate YOLOv8 technique is utilized, followed by feature extraction through a backbone deep fusion-based convolutional transform learning of VGG16, CapsNet, and EfficientNetB7 to capture both spatial and hierarchical features across varying scales. Moreover, the graph convolutional neural network (GCN) technique is employed for long-term target detection and tracking models. Finally, the hybrid nonlinear whale optimization algorithm with sine cosine (SCWOA) is implemented for the optimum choice of the hyperparameters involved in the GCN technique. The experimental study of the LTTDT–UAVDFCTL approach is performed under the VisDrone dataset. The performance validation of the LTTDT–UAVDFCTL approach portrayed a superior mAP value of 80.13% over existing models.

KW - Computer vision

KW - Convolutional transform learning

KW - Fusion model

KW - Long-term target detection

KW - Scale variations environment

KW - Unmanned aerial vehicles

UR - https://www.scopus.com/pages/publications/105012458517

U2 - 10.1038/s41598-025-09652-1

DO - 10.1038/s41598-025-09652-1

M3 - Article

C2 - 40753260

AN - SCOPUS:105012458517

SN - 2045-2322

VL - 15

JO - Scientific Reports

JF - Scientific Reports

IS - 1

M1 - 28248

ER -

Convolutional transform learning based fusion framework for scale invariant long term target detection and tracking in unmanned aerial vehicles

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this