TY - GEN
T1 - Machine Learning Techniques to Predict Academic Performance of Health Sciences Students
AU - Alharthi, Hana
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - Prediction of academic performance of health sciences students prior to being fully engaged in academic studies will identify those students who may need early intervention. Machine learning (ML), a branch of artificial intelligence, can be used to predict the academic performance of such students and the factors that continue to impact their academic performance. Objective: To use a best fit model in ML to predict the academic performance of health science students and rank the most important factors affecting their performance. Method: The academic records of 3468 students were extracted from the student information system (SIS), which included preparatory year great point average (GPA), high school GPA, Achievement Test (AT), General Aptitude Test (GAT), and cumulative GPA upon graduation. Multiple machine learning algorithms were used to develop the best fit model to predict students' performance GPA and identify factors that contributed to GP A. Results: The best performing classifier based on area under the curve (AUC) is random forest (.773) followed by naïve bayes (.758), Support Vector Machine (.686), k-nearest neighbors (.684) and decision tree (.658), the three scoring methods showed preparatory year GPA, gender, and high school GPA were the top variables predicating student cumulative GPAs. Conclusion: Random forest model can assist college administrators and faculty in health colleges to predict which students are more likely to underperform during their undergraduate studies.
AB - Prediction of academic performance of health sciences students prior to being fully engaged in academic studies will identify those students who may need early intervention. Machine learning (ML), a branch of artificial intelligence, can be used to predict the academic performance of such students and the factors that continue to impact their academic performance. Objective: To use a best fit model in ML to predict the academic performance of health science students and rank the most important factors affecting their performance. Method: The academic records of 3468 students were extracted from the student information system (SIS), which included preparatory year great point average (GPA), high school GPA, Achievement Test (AT), General Aptitude Test (GAT), and cumulative GPA upon graduation. Multiple machine learning algorithms were used to develop the best fit model to predict students' performance GPA and identify factors that contributed to GP A. Results: The best performing classifier based on area under the curve (AUC) is random forest (.773) followed by naïve bayes (.758), Support Vector Machine (.686), k-nearest neighbors (.684) and decision tree (.658), the three scoring methods showed preparatory year GPA, gender, and high school GPA were the top variables predicating student cumulative GPAs. Conclusion: Random forest model can assist college administrators and faculty in health colleges to predict which students are more likely to underperform during their undergraduate studies.
KW - algorithms
KW - Classifiers
KW - GPA
KW - Machine learning
KW - ML
UR - https://www.scopus.com/pages/publications/85125657678
U2 - 10.1109/DCABES52998.2021.00015
DO - 10.1109/DCABES52998.2021.00015
M3 - Conference contribution
AN - SCOPUS:85125657678
T3 - Proceedings - 2021 20th International Symposium on Distributed Computing and Applications for Business Engineering and Science, DCABES 2021
SP - 33
EP - 36
BT - Proceedings - 2021 20th International Symposium on Distributed Computing and Applications for Business Engineering and Science, DCABES 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th International Symposium on Distributed Computing and Applications for Business Engineering and Science, DCABES 2021
Y2 - 10 December 2021 through 12 December 2021
ER -