TY - JOUR
T1 - Machine Learning Approaches for Predicting Risk of Cardiometabolic Disease among University Students
AU - Musleh, Dhiaa
AU - Alkhwaja, Ali
AU - Alkhwaja, Ibrahim
AU - Alghamdi, Mohammed
AU - Abahussain, Hussam
AU - Albugami, Mohammed
AU - Alfawaz, Faisal
AU - El-Ashker, Said
AU - Al-Hariri, Mohammed
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/3
Y1 - 2024/3
N2 - Obesity is increasingly becoming a prevalent health concern among adolescents, leading to significant risks like cardiometabolic diseases (CMDs). The early discovery and diagnosis of CMD is essential for better outcomes. This study aims to build a reliable artificial intelligence model that can predict CMD using various machine learning techniques. Support vector machines (SVMs), K-Nearest neighbor (KNN), Logistic Regression (LR), Random Forest (RF), and Gradient Boosting are five robust classifiers that are compared in this study. A novel “risk level” feature, derived through fuzzy logic applied to the Conicity Index, as a novel feature, which was previously unused, is introduced to enhance the interpretability and discriminatory properties of the proposed models. As the Conicity Index scores indicate CMD risk, two separate models are developed to address each gender individually. The performance of the proposed models is assessed using two datasets obtained from 295 records of undergraduate students in Saudi Arabia. The dataset comprises 121 male and 174 female students with diverse risk levels. Notably, Logistic Regression emerges as the top performer among males, achieving an accuracy score of 91%, while Gradient Boosting lags with a score of 72%. Among females, both Support Vector Machine and Logistic Regression lead with an accuracy score of 87%, while Random Forest performs least optimally with a score of 80%.
AB - Obesity is increasingly becoming a prevalent health concern among adolescents, leading to significant risks like cardiometabolic diseases (CMDs). The early discovery and diagnosis of CMD is essential for better outcomes. This study aims to build a reliable artificial intelligence model that can predict CMD using various machine learning techniques. Support vector machines (SVMs), K-Nearest neighbor (KNN), Logistic Regression (LR), Random Forest (RF), and Gradient Boosting are five robust classifiers that are compared in this study. A novel “risk level” feature, derived through fuzzy logic applied to the Conicity Index, as a novel feature, which was previously unused, is introduced to enhance the interpretability and discriminatory properties of the proposed models. As the Conicity Index scores indicate CMD risk, two separate models are developed to address each gender individually. The performance of the proposed models is assessed using two datasets obtained from 295 records of undergraduate students in Saudi Arabia. The dataset comprises 121 male and 174 female students with diverse risk levels. Notably, Logistic Regression emerges as the top performer among males, achieving an accuracy score of 91%, while Gradient Boosting lags with a score of 72%. Among females, both Support Vector Machine and Logistic Regression lead with an accuracy score of 87%, while Random Forest performs least optimally with a score of 80%.
KW - cardiometabolic disease
KW - CMD risk prediction
KW - machine learning
UR - https://www.scopus.com/pages/publications/85188796527
U2 - 10.3390/bdcc8030031
DO - 10.3390/bdcc8030031
M3 - Article
AN - SCOPUS:85188796527
SN - 2504-2289
VL - 8
JO - Big Data and Cognitive Computing
JF - Big Data and Cognitive Computing
IS - 3
M1 - 31
ER -